vivo database cost reduction practice: exploring the most cost-effective database solution

Since vivo began researching and testing OceanBase in 2022, it has now launched 17 business systems, covering logging, analysis, and transaction businesses, achieving a total resource saving of 80%, and greatly simplifying development and operation and maintenance work. In this article, Liao Guangming, senior database engineer of the IT department of vivo system and process, introduces in detail the current application status of OceanBase, the problems encountered, and the comparison of various features of OceanBase and MySQL in the development and operation and maintenance architecture.

Picture

Vivo is a technology company with smart terminals and smart services as its core. Its research and development lines cover 5G communications, artificial intelligence, industrial design, imaging technology, etc. Thanks to vivo’s smart manufacturing network (including brand licensing), vivo has an annual production capacity of nearly 200 million units so far, providing high-quality products and services to 500 million users in more than 60 countries and regions around the world.

My department is mainly responsible for MES, ERP, production, sales and inventory systems, such as offline store purchase systems, online application malls, App-side application markets, etc. Vivo uses a wide variety of databases as a whole, with about 1,000+ hosts, involving 15 types of databases (see Figure 1), and data synchronization tools that need to be maintained. In addition, it also has multi-region and multi-cloud features, including self-built computer rooms in Dongguan and overseas use of Alibaba Cloud and AWS RDS databases.

Picture

Figure 1 Types of databases used by vivo

The way we use the database is also relatively complex. For example, if we use MySQL master-slave and MySQL MGR at the same time in a business, if it involves separation of read and write, we have to superimpose the use of Proxy separately. The complexity of the multiple types and architectures of the database brings great challenges to the operation and maintenance team. brought greater pressure.

In addition, we also face problems such as low resource utilization, a large number of machines, and some system resource expansion bottlenecks.

  • RDS has high cost and low resource utilization: the most expensive CPU resource utilization is low, the average CPU utilization is less than 5%, and the average CPU peak is less than 20%; disk and I/O resource utilization is also low. In AWS, storage types are bound to IOPS. Each system allocates resources according to the maximum specifications in the future, and the peak time nodes of system resources are also different.

  • The number of databases continues to increase, and resource and management costs increase: business microservices are split into increasingly finer details, resulting in an increasing number of database instances and increasing RDS costs.

  • MySQL database has insufficient scalability: MySQL database has limited scalability and relies on sub-databases and tables for horizontal expansion, which brings complexity to development and operation.

Based on business considerations, the core business system has a sub-database and table architecture, and cross-database JOIN queries cannot be avoided in some scenarios, making operation and maintenance complex. Based on cost considerations, not only the operation and maintenance costs are high, but the database costs are also high. Therefore, we began to explore whether there is a better database solution.

Picture

When selecting, we investigated two distributed databases and finally chose to introduce OceanBase. From a selection perspective, we focus on several aspects:

  • Product maturity. OceanBase was launched in 2010 and has experienced the test of many application scenarios in Alibaba and Ant, especially Alipay’s core business and “Tmall Double 11”, and its product maturity has been verified.

  • Product performance. We compared the concurrency performance of OceanBase and MySQL (see Figure 2). Under 50 concurrency, OceanBase single-master mode (specify the master node, only the master node provides services) is better than the multi-master mode (all nodes provide services), however, MySQL’s response time is shorter; above 50 concurrency, OceanBase’s response time is faster Fast; in complex SQL scenarios, OceanBase is better than MySQL in all aspects.

  • MySQL compatibility. We also value the compatibility of our products with MySQL. After testing, we found that OceanBase is highly compatible with MySQL and supports smooth migration of application systems without any compatibility issues during the process. In particular, OceanBase 4.2 version is basically compatible with the syntax of MySQL 8.0, avoiding user modification costs.

  • Operation and maintenance tools. DBAs pay great attention to whether the supporting operation and maintenance tools are complete. OceanBase has complete operation and maintenance tools, such as the installation and deployment tool OBD, the cluster management platform OCP, the migration service OMS, etc. At the same time, OceanBase supports more than 400 ecological tools such as promethus and binlog, and is compatible with our existing Flink CDC + logproxy data synchronization method.

Picture

Figure 2 Sysbench stress test of OceanBase and MySQL (three nodes 32C 128G)

In addition, compared to MySQL, OceanBase has an extremely high compression rate, which can save storage costs; it is natively distributed, can complete data horizontal expansion and vertical smooth expansion faster, and is natively highly available.

We also tested some functions such as data compression rate, performance, cost, etc.

? Data compression and performance

We observed the compression rate and response time of OceanBase based on the production, sales and inventory business distributed on the cloud and offline. By capturing the full amount of MySQL logs and playing back the MySQL and OceanBase databases in the full path, we compared their SQL execution efficiency.

On the cloud, we enable the RDS SLS audit log, capture the business load within the RDS time period, and deliver it to OSS. We read the OSS audit log through OMA, parse it into a SQL file, and record the RDS kernel execution time in the audit log, and then Use the parsed file to send playback traffic to OceanBase at the user-defined playback speed, record the RT time, and compile the final result report in conjunction with RDS in SLS.

Offline playback also turns on the RDS SLS audit log, captures the business load within the RDS time period, and delivers it to OSS, but only uses the SQL content parsed by SLS and does not use the RDS time record in SLS. Restore the temporary RDS instance from the production environment backup for traffic playback, then deploy OMA offline, play back the SQL load to RDS and MySQL at the same time, and calculate the execution time respectively to generate the final report.

From the test results in Figure 3, it can be seen that compared with the source database (MySQL), OceanBase’s efficiency is more advantageous. In terms of data storage, OceanBase’s compression ratio can reach 5.7 times to 15.3 times. The most important thing is that the MySQL statements in the business system are fully compatible with OceanBase without modification.

Picture

Figure 3 Comparison of data compression rates between source database (MySQL) and OceanBase

At the same time, through application playback, statistics are compared on the response time of frequently executed SQL (the former MySQL, the latter OceanBase, if less than 1ms is recorded as 0ms), it can be seen from Figure 4 that half of the business SQL performance is the same, about 1ms; the other half of the business SQL has been improved by dozens or hundreds of times.

Picture

Figure 4 Comparison of response times between MySQL and OceanBase

Due to the large amount of offline business data of production, sales and storage, this is also our direct appeal for introducing OceanBase.

The data volume of vivo’s business systems in different provinces is not consistent. For example, the business volume in Guangdong Province is relatively large. The largest database of a store’s business system is 2TB. Large provinces will have independent databases, and small provinces will share the database, but all provinces are integrated. Data needs.

Previously, the entire business system adopted a sub-database and sub-table architecture. In query scenarios, it was difficult to ensure the real-time and accuracy of queries by aggregating data from each database. After performance testing of OceanBase, we can merge the summary database and more than ten sub-databases across the country into an independent OceanBase database to meet the original cross-database query requirements and better meet business needs.

? Cost reduction experience

OceanBase’s storage engine and data compression ratio based on the LSM-Tree architecture provide more possibilities for enterprises to reduce storage and hardware costs. From the perspective of hardware, OceanBase’s cost reduction advantage lies in oversales. Each system uses super-resolution resources at staggered peaks. The flexibility between different tenants can be time-shared and multiplexed to improve resource utilization. Assume that corresponding to the following three systems, the resources used during peak periods require 4C8G. If you use MySQL, you need to allocate three 4C8G resources to the three business systems; if you use OceanBase, you only need to allocate three 1C8G tenants, and the remaining three CPU resources can be shared. In addition, if the read-write separation feature of OceanBase is turned on, CPU resources can be further fully utilized.

Picture

Figure 5 OceanBase cost reduction

Picture

Currently, we have launched 17 business systems, including log, analysis, transaction, etc. The original business system architecture is divided into separate databases and tables, which are divided according to the amount of data in each province. Large provinces will have independent databases, and small provinces will share the database. In addition, the data that needs to be summarized and queried will be summarized together through DTS; the total number of hosts is 20 units, data about 4.5T. After OceanBase was launched, the integrated resources were 12C and 20G tenant specifications, with read and write separation enabled, and the data size was about 600GB. After aggregating each sub-database into one OceanBase database, the total resources are saved by 80%, and there is no need to aggregate the database, and the accuracy and real-time performance of the query are also guaranteed.

Picture

Figure 6 Online OceanBase saves 80% of total resources

However, we also encountered some problems during use.

First, in versions prior to OceanBase 4.2, when the statement had large fields, we wrote the scenario execution plan incorrectly and should have gone through the entire table through the index. The R&D feedback was that the cost assessment model did not consider the large field scenario. This problem is currently solved in version 4.2.

Table: t_log_record` (`inter_param` longtext outer_param` longtext ,create_time` timestamp NOT NULL) The index column is create_time</code>
<code> select * from t_log_record WHERE create_time >= '2023-08-11 09:00:00' and create_time <= '2023-08-11 23:59:59';

Secondly, the backup periods should be appropriately staggered, and the NAS resource mounting should adopt the officially recommended parameters. If the NAS resources are not enough, hangs may occur.

Third, there is the problem of OCP management resource application container. When there is a lot of cluster management, the OCP container should be expanded to avoid OCP lag.

We have also discovered some problems with parameter settings, which are listed here for your reference. The first problem is the default auto-increment sequence. If OceanBase is set to an Order mode similar to Oracle Sequence, the concurrency will also decrease. The second problem is the parameter auto-increment column. If you are more concerned about the auto-increment value and it has business implications, you can try to set the value as small as possible to avoid number hopping in some scenarios. If you continue to use the default parameters, the jump will be larger. .

In addition, the query parameters are only 10 seconds by default. If they are not increased, they may easily time out in certain scenarios, such as index creation.

Picture

We enable read-write separation when using OceanBase, which allows us to make fuller use of machine resources. The reason why the multi-master writing mode is not adopted is because the response time and TPS of OceanBase’s multi-master mode are not as good as those of MySQL in scenarios with low concurrency. The single-master mode is also recommended by default in public clouds.

At present, we have been in contact with OceanBase for half a year. From the overall testing and experience, it is in line with expectations.

Picture

Based on the current understanding and application experience of OceanBase and MySQL, I have summarized the advantages and disadvantages of OceanBase compared with MySQL for your reference.

Picture

  • Compatibility: There is no need to modify the code when migrating from MySQL 5.7 version to OceanBase. However, the current OceanBase 4.0 version does not support reverse indexes.

  • Online DDL: Although MySQL 5.7 is weakly supported, it has to be said that this is a pain point of MySQL and requires the help of extension tools, and OceanBase can support almost all commonly used functions, except for the online partition table mode.

  • High availability and read-write separation: MySQL requires third-party tools to achieve high availability, read-write separation, and auditing, which OceanBase natively supports.

  • Execution plan fixation: As we all know, the machine is normally used in Oracle, but not in MySQL. Query write rewriting is supported, but it is not easy to control. OceanBase supports Outline fixation.

  • Global index: Although MySQL has partitioned tables, it does not support global indexes. However, there are some scenarios where the use of global indexes is unavoidable, and OceanBase fully supports this.

  • Sequence: In some log scenarios, we sometimes want to save the month of the log. MySQL does not have a sequence feature, and we need to write scripts, which brings high-level problems. OceanBase’s support for sequences avoids such problems.

  • TTL feature: Not supported by MySQL, supported by OceanBase version 4.2.

  • Large transaction support: MySQL can easily cause master-slave delays. If Mgr is used, flow control will easily remove nodes once the transaction is large. In data migration scenarios, we found that even with TB-level data, OceanBase has no pressure.

  • Hash join: MySQL 8.0 supports it, but it is weak; OceanBase supports it.

  • Parallelism: MySQL does not support it, which is one of the reasons for the slowness; OceanBase supports parallelism, and if the query optimization does not move, brute force can be used if the CPU is sufficient.

  • Deadlock: MySQL does not support deadlock detection views. If you encounter a deadlock, you can only go to the background to read the log; OceanBase has a deadlock detection view query.

  • DBLink: Although MySQL claims to have third-party components, few people use it; OceanBase 4.2 begins to support it.

  • SQL current limit: In fact, OceanBase can limit the concurrency of SQL by adding hint concurrency. There is no particularly good way to limit SQL traffic using MySQL or Oracle. OceanBase also has the ability to automatically limit slow SQL when it is continuously used. If it exceeds the default 10s, it will only use the default 30% of resources for slow SQL to prevent one slow SQL from bringing down the entire system.

  • Data compression ratio: MySQL’s data compression depends on the operating system file system. In comparison, OceanBase’s data compression ratio is 6 to 7 times that of MySQL.

  • Memory recycling: If we use MySQL, we often find that the percentage of memory usage is high during use. However, if we want to recycle this memory, it is not easy to do it, because we don’t know the actual usage of MySQL by looking at the memory usage ratio. How much memory is needed. However, OceanBase will periodically recycle merges and release memory, making it easier to monitor the actual memory required by the database.

  • Fragmentation: Frequent DML is prone to fragmentation when using MySQL. Since OceanBase uses the LSM architecture, it is merged regularly and is not prone to fragmentation.

  • Backup and recovery (flashback query): MySQL does not support it and relies on binlog. OceanBase supports table-level flashback query and tenant-level flashback operations, and can be tested multiple times on the same data.

  • Specification changes: When using MySQL, the application will experience interruptions when cloud RDS resource specifications are expanded; there will be no interruptions when OceanBase tenant specifications are changed.

  • Version upgrades and changes: MySQL has no official tools and needs to be developed by itself, which is very unfriendly to departments with heavy upgrade workloads. OceanBase provides native upgrade tools, which is relatively smooth.

In addition, ecological tools are a powerful tool for DBAs to reduce burdens and improve efficiency. MySQL relies on self-development. OceanBase has self-developed tools such as ODC, OCP, and OMS, which can support the development of self-service SQL auditing, data execution, data archiving, data deletion, DDL and other common cumbersome and dangerous operations can assist DBAs in monitoring and building, high-availability drills, problem locating, backup and recovery, specification expansion, statement review and execution, as well as data import, export, migration, synchronization, cleaning, etc. Currently, we can basically complete relevant operation and maintenance work through the WEB interface of the OceanBase tool. It has fine-grained permission control and can open relevant permissions and help relevant personnel to help themselves, further simplifying the repetitive work of DBAs. OceanBase is also compatible and integrated with 400+ upstream and downstream ecological tools, which is also very convenient for users.

At the same time, I also summarized the things that need to be paid attention to when using OceanBase tools, as well as the details we hope to optimize.

? First of all, in terms of functionality:

  • When upgrading the OceanBase version on OCP, tenants are not allowed to have multiple primary_zones. If there are many tenants during the upgrade, switching the primary node will be cumbersome. Currently version 4.2 is supported.

  • If the user manually changes the root password of the cluster or the root password of the tenant (not through the OCP platform), the password cannot be reset. This is more dangerous if someone intentionally causes harm. (Supported by OceanBase plan 4.2.3 version)

  • When OCP checks the execution plan, there are only known index column statistics and no other column statistics. I hope to add column statistics after the relevant where condition columns. For example, the create_time column does not differentiate well, but I want to know whether the time_cost column is suitable for indexing. (Officially informed that the obdiag tool supports this function)

  • When CPU memory can be isolated, it can support the minimum and maximum range and ensure minimum resources. In many scenarios, users need more resources during initialization, and resources are relatively idle at other times. (OCP 4.3 planned support)

  • OBProxy read-write nodes support weight configuration to prevent load imbalance.

  • OceanBase supports Oracle Intervel partitioning features. Currently it can be supported through the ODC tool, but it is hoped that the database kernel will support it natively. OceanBase plans to support this feature in the kernel 4.4 version next year.

  • The OMS management platform supports third-party login and role mapping.

? Secondly in terms of ease of use:

  • On the OCP platform, if the tenant is created by a script, you need to manually enter the password when logging in to the OCP console; then if you switch OCP users, you need to enter the password again. (OCP 4.2.2 Lockbox usability improvements)

  • The OCP tenant interface can have user instructions, which makes it easier for users to know the system under their own name. (OCP 4.2.2 supports tag management)

  • OMS adds a data source, and when entering the OceanBase connection string, relevant information can be selected based on OCP, reducing the input workload.

Picture

This database selection and replacement took half a year, and finally achieved good results, mainly reflected in: OceanBase is highly compatible with MySQL, and there is no need to modify the code during migration; after the MySQL sub-database and sub-table architecture was replaced by OceanBase, the total resource usage was saved 80%, which also greatly reduces operation and maintenance costs; OceanBase’s operation and maintenance tools and community support help DBAs simplify operation and maintenance complexity and reduce operation and maintenance burden.

Vivo’s plan to apply OceanBase is divided into three steps.

The first step is pilot operation. Let OceanBase Community Edition replace MySQL as a starting point, and prioritize businesses that have pain points in sharding databases and tables or have high resource occupancy costs as pilot projects for migrating to OceanBase. Transform or migrate functional modules online in order from low to high impact. During the process, development, DBA or related teams simultaneously build OceanBase’s technical capabilities.

The second step is business expansion and large-scale transformation. The traditional database architecture is transformed into the OceanBase database architecture. At the same time, combined with the characteristics of OceanBase, suitable and valuable business transformations are selected, using OceanBase as a supplement for scenarios where MySQL’s capabilities are insufficient, while still retaining the applicable scenarios for MySQL.

The third step is to transform on a large scale and fully guarantee the business. Mature OceanBase’s technical capabilities in IT construction and implement them in more vivo business scenarios, while optimizing costs, reducing costs and increasing efficiency, and building a disaster recovery environment.

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. MySQL entry-level skills treeHomepageOverview 77930 people are learning the system