Interpretation of OceanBase 4.1: What new features and features of MySQL 8.0 do we support?

This article mainly introduces OceanBase’s support for the new features of MySQL 8.0 when version 4.1 is released. The author of the article is Liu Bin, a senior technical expert of OceanBase. He has participated in the research and development of OceanBase RS and storage modules, and is currently the head of the SQL execution group.

MySQL is one of the most popular relational databases in the industry, and many users regard MySQL as the first choice for starting to use the database. One of the important features of OceanBase is that it is fully compatible with MySQL. Users can complete the database upgrade and migration without modifying the code, which also greatly reduces the learning cost of developers. As a completely self-developed database, OceanBase has invested a lot of work in the development of MySQL compatibility since version 1.0, and has realized comprehensive MySQL compatibility in SQL syntax, data types, and system functions.

With the development of MySQL and changes in user requirements, OceanBase’s MySQL-compatible version has also developed from 5.6 to 5.7 to 8.0. Always bring a simple and friendly development experience to users. Although version 5.7 is still the main version of many MySQL user businesses, with the end of official support for version 5.7 (MySQL 5.7 will end official support in October 2023, from: https://www.oracle.com/us /support/library/lifetime-support-technology-069183.pdf), and more MySQL users will switch their business to MySQL 8.0.

OceanBase 4.1’s compatibility strategy for MySQL is fully compatible with 5.7 and supports 8.0 functions at the same time. Compared with MySQL 5.7, version 8.0 has significantly improved performance, security, and usability, and also added many new features. In my opinion, the most important new features of MySQL 8.0 are as follows. This article will also introduce OceanBase’s support for these features when version 4.1 is released:

  • Window Function?
  • Common table expression
  • Hash Join
  • index management
  • Resource management

Window function: stronger data analysis capabilities

Window Function (Window Function) is a new function of MySQL 8.0. It can perform aggregation operations on data rows in a certain window instead of aggregating the entire result set. It can more simply and intuitively perform time series analysis, cumulative statistics, Ranking and other operations help users discover the value of data more deeply. Different from the aggregate function, the window function mainly controls the rows participating in the calculation through the window. It does not combine multiple rows of query results into one row, but puts the results back into multiple rows. The calculation in the window not only supports all aggregation functions (such as: count, sum, min, max, avg, stddev …), but also supports some window functions, including: cume_dist, dense_rank, first_value, lag, last_value, lead, nth_value, ntile, percent_rank, rank, row_number.

Window functions are mainly used in data analysis, such as calculating data trend changes, sorting data based on different indicators, and calculating complex statistical indicators. For example, suppose we want to rank athletes by project in a certain scene, we can create a table called athlete_scores, set the sport category (sport_type), athlete name (athlete_name), score (score), and use The following functions perform ranking operations:

SELECT sport_type, athlete_name, score,
       RANK() OVER (PARTITION BY sport_type ORDER BY score DESC) as `rank`
FROM athlete_scores
ORDER BY sport_type, `rank`

Because window functions are widely used in OLAP scenarios, OceanBase has provided window functions since version 1.x. Window functions tend to provide better performance than using subqueries or join operations. At the same time, the database optimizer can effectively handle window functions and reduce the cost of query calculations. It is worth mentioning that even if the user does not use window functions in SQL writing, in some scenarios, OceanBase will automatically rewrite part of the SQL into statements containing window functions to improve performance.

Common Table Expressions: Enhancing SQL Maintainability

Common table expression (CTE for short) is an SQL function introduced by the SQL1999 standard, which was introduced by various database vendors in early 2000. MySQL released a complete CTE implementation in 8.0. CTE allows one/multiple temporary tables (views) to be defined in SQL statements through the WITH keyword and applied in queries, usually in multiple subqueries, hierarchical queries, and recursive queries. With the help of CTE, users can split complex queries into multiple logical parts, making the structure of complex queries clearer, the logic of SQL codes more intuitive, and making queries easier to understand and maintain.

Users can also implement more complex functions through recursive CTEs. For example, in MySQL, a 1000-row test number with an auto-increment column (id) and a random number (rand_val) can be generated through the following SQL statement:

WITH RECURSIVE test_data (id, rand_val) AS (
    SELECT 1, RAND()
    UNION ALL
    SELECT id + 1, RAND()
    FROM test_data
    WHERE id < 1000
)
SELECT *
FROM test_data;

OceanBase can automatically identify common subqueries and extract them as common table expressions. OceanBase has supported the CTE function since version 2.0. Even if the user does not use CTE in SQL writing, OceanBase will identify and extract it during the rewriting phase, and optimize performance by reducing the number of subquery executions. For example, in the following SQL and plan, the SELECT c1, MIN(c2) c2, MAX(c3) c3 FROM t1 GROUP BY c1 subquery is used twice, so it can be extracted as a common subquery Inquire.

SELECT *
FROM t2
WHERE EXISTS (
        SELECT *
        FROM (
            SELECT c1, MIN(c2) c2, MAX(c3) c3
            FROM t1
            GROUP BY c1
            ) a
        WHERE t2.c1 = a.c1 AND t2.c2 = a.c2
        )
    AND EXISTS (
        SELECT *
        FROM (
            SELECT c1, MIN(c2) c2, MAX(c3) c3
            FROM t1
            GROUP BY c1
            ) b
        WHERE t2.c3 = b.c1 AND t2.c4 = b.c3
        );

==================================================== ================
|ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)|
-------------------------------------------------- ---------------
|0 |TEMP TABLE TRANSFORMATION| |1 |5 |
|1 | TEMP TABLE INSERT |TEMP1 |1 |3 |
|2 | HASH GROUP BY | |1 |3 |
|3 | TABLE SCAN |t1 |1 |2 |
|4 | HASH JOIN | |1 |3 |
|5 | TEMP TABLE ACCESS |VIEW2(TEMP1)|1 |1 |
|6 | HASH JOIN | |1 |3 |
|7 | TEMP TABLE ACCESS |VIEW1(TEMP1)|1 |1 |
|8 | TABLE SCAN |t2 |1 |2 |
==================================================== ================

It can be seen that the execution plan given by OceanBase when the user does not use CTE is consistent with the execution plan of the SQL written by the user himself using CTE. In this example, OceanBase’s TEMP TABLE TRANSFORMATION operator will store the subquery result (TEMP 1) and read it multiple times later to reduce the number of subquery executions.

WITH tmp AS (
    SELECT c1, MIN(c2) c2, MAX(c3) c3
    FROM t1
    GROUP BY c1
)
SELECT *
FROM t2
WHERE EXISTS (
        SELECT *
        FROM tmp a
        WHERE t2.c1 = a.c1 AND t2.c2 = a.c2
        )
    AND EXISTS (
        SELECT *
        FROM tmp b
        WHERE t2.c3 = b.c1 AND t2.c4 = b.c3
        );

==================================================== =========
|ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)|
-------------------------------------------------- ---------
|0 |TEMP TABLE TRANSFORMATION| |1 |5 |
|1 | TEMP TABLE INSERT |tmp |1 |3 |
|2 | HASH GROUP BY | |1 |3 |
|3 | TABLE SCAN |t1 |1 |2 |
|4 | HASH JOIN | |1 |3 |
|5 | TEMP TABLE ACCESS |b(tmp)|1 |1 |
|6 | HASH JOIN | |1 |3 |
|7 | TEMP TABLE ACCESS |a(tmp)|1 |1 |
|8 | TABLE SCAN |t2 |1 |2 |
==================================================== =========

Hash Join Algorithm: Improve Big Data Connection Performance

Hash Join is a connection (JOIN) algorithm widely implemented in databases to handle connection operations between two or more tables. In the Hash Join algorithm, first build a Hash table based on one of the tables (generally choose a table with a small amount of data), and then perform Hash table detection on each row of the other table to find the matching row and generate a result set. There is usually a performance advantage when the amount of data is large.

The only connection algorithm supported by MySQL before version 8.0 is Nested Loop Join. This connection algorithm has relatively good performance in Web applications and OLTP scenarios, so Hash Join does not have a high priority. Nested Loop Join is especially suitable for the case where the index can be used effectively. The method is to use one of the tables as the outer table and the other table as the inner table, and traverse all the inner rows for each row in the outer table (if there is a suitable index, you can use the index ) to find the matching rows. However, when the amount of data in the inner table is large and there is no suitable index, the performance of Nested Loop Join is poor. As users’ demand for large data connections increases, MySQL supports the Hash Join algorithm from version 8.0.

OceanBase has always supported the Hash Join algorithm. Unlike stand-alone databases, many of the application scenarios faced by OceanBase involve massive data processing. In addition, indexes (global) in distributed scenarios often require cross-machine access, and it is difficult to achieve the desired effect by optimizing Nested Loop Join through index building. However, Hash Join usually performs better in distributed and large data volume scenarios, so we have supported it from the early stage of research and development. At the same time, we also support Nested Loop Join and Merge Join. These two Join algorithms have performance advantages in scenarios where indexes or sequences can be used. OceanBase’s query optimizer will automatically select the most suitable algorithm for users based on the judgment of the cost model.

Index Management: More efficient index management

MySQL 8.0 introduces the function of setting index visibility (visiable/invisiable), which allows users to set the index to be invisible without deleting the index. In the invisible state, the query optimizer ignores the index and does not use it for query planning, but its index data will still be maintained normally. The advantage of this is that you can test the impact of indexes on query performance while preserving the index structure. If you need to re-enable indexing later, just make it visible.

For example, when we want to delete an index, we can set it to invisible, and wait for the business to run for a period of time to confirm that the index is not used before deleting it. If a business SQL is found to depend on the index, the index can be changed back to visible. Since the index data is still maintained normally, the speed of setting the index visibility is very fast, avoiding the overhead of re-creating the index after discovering business dependencies after deleting the index.

The syntax for setting index visibility in MySQL 8.0 is as follows:

-- set the index to be invisible
ALTER TABLE table_name ALTER INDEX index_name INVISIBLE;

-- Set the index to be visible
ALTER TABLE table_name ALTER INDEX index_name VISIBLE;

OceanBase supports index visibility settings from version 1.x. By setting the index to be invisible, users can test the impact of adjusting the indexing strategy on query performance in the existing database environment without deleting or disabling the index. At the same time, the index itself needs to be maintained and consumes resources, and legacy redundant indexes may also slow down performance. By setting index visibility, users can safely verify the impact of indexes on query performance, thereby avoiding resource waste. In addition, by setting the index visibility, the index can be quickly restored without rebuilding the index, thereby avoiding the risk of performance degradation caused by accidentally deleting a key index. On the whole, index visibility settings can help users manage index policies more flexibly, optimize query performance, and reduce resource waste, thereby improving database management efficiency.

In addition, MySQL 8.0 also adds support for reverse index, which means that the index column can specify descending order, so that some descending sorts can use this index. As shown below, a query of ORDER BY c1 DESC, c2 can take advantage of this index.

CREATE TABLE t1 (c1 INT, c2 INT, INDEX i1 (c1 DESC, c2 ASC));

Because users had less demand for reverse index in the past, OceanBase currently does not support the reverse index function. For the scenario of reverse order sorting, the OceanBase optimizer will use the positive order index (reverse order scan), and supports prefix sorting and accelerated sorting through parallel (PX), and we will soon support this feature in subsequent versions. For the reverse sorting of a single column, OceanBase can use the index by scanning the index in reverse order. For a multi-column sorting scenario that mixes forward and reverse order, OceanBase will use the index as much as possible, and only sort the columns that cannot use the index when sorting. If the index is index1 (c1, c2, c3), and the sorting is ORDER BY c1 desc, c2 asc, c3 asc, OceanBase will select index1 Scan in reverse order, and then use prefix sorting to sort only c2, c3. In addition, if the sorting performance still does not meet the requirements after indexing, you can try to speed up sorting through parallel execution (PX).

Resource Management: Improve Resource Utilization

MySQL 8.0 began to support the resource group function, which can specify the CPU resources used by the resource group and task scheduling priority, so as to manage and control the execution of database queries. At the same time, you can specify a resource group for a thread through the SET RESOURCE GROUP syntax, or specify a resource group for a statement through /* + RESOURCE_GROUP(xxx) */ hint. Through the above methods, the DBA can control the resource (CPU) usage of different tasks and flexibly adjust resource allocation, thereby improving the overall utilization of resources.

OceanBase supports richer resource management strategies. Starting from version 3.x, OceanBase supports Oracle’s resource management (DBMS_RESOURCE_MANAGER package) in Oracle mode. Unlike MySQL’s resource management, which focuses on CPU resource allocation, OceanBase provides rich and flexible resource management strategies. A variety of different resource plans can be set, and resource allocation can be dynamically adjusted according to the consumption situation, with more complete functions and more customization possibilities.

OceanBase is also in version 3.2 and 4.1, opening the resource management capability to OceanBase MySQL mode. Users can create resource groups to specify management plans (CPU, IO resources) through the CREATE_CONSUMER_GROUP, CREATE_PLAN, CREATE_PLAN_DIRECTIVE process in the DBMS_RESOURCE_MANAGER package, and then use the SET_CONSUMER_GROUP_MAPPING interface by user Or data column access rules specify resource groups for SQL.

Currently, the similarities and differences between OceanBase resource management and MySQL 8.0 resource management are as follows:

CPU Management IO Management Memory management Specified by session Specified by user Specify by statement
MySQL 8.0 ? ? ? ? ? ? Specify by hint
OceanBase ? ? ? ? ? ? Specified by column access rules

Written at the end

In addition to the features introduced in this article, MySQL 8.0 also brings functional features such as JSON type enhancement, support for INTERSECT, EXPECT, etc., and also solves long-standing deficiencies such as Atomic DDL, self-increment persistence, and the default character set is changed from latin1 Change to utf8mb4, group by columns no longer support behavior changes such as asc/desc. When users consider migrating their business from lower versions to MySQL 8.0, they need to fully evaluate the impact of these changes. In my opinion, MySQL 8.0 is a better MySQL, and OceanBase is not just MySQL. We will continue to optimize the compatibility with MySQL 8.0, and welcome everyone to experience OceanBase 4.1 and share your experience.

Welcome to visit OceanBase official website for more information: https://www.oceanbase.com/