Here comes the problem: after the database is divided into tables, the query is too slow. How to optimize it?

Said before:

In the Nien community, many friends reported that Sharding-JDBC paging query is extremely slow. How to deal with it?

There are many, many friends who have reported this issue.

And this question is also the core problem of the interview. Some time ago, some friends reported that they encountered this problem during the interview.

After the database is divided into tables, the paging query is too slow. How to optimize it?

Nien Tip: The knowledge of sub-database and sub-table is not only the core knowledge for interviews, but also the core knowledge for development.

Therefore, here Nien will give you a systematic and systematic review, so that you can fully demonstrate your strong “technical muscles” and make the interviewer “can’t help himself and drool” .

This question and reference answers are also included in our “Nien Java Interview Guide PDF” V120 version for reference by subsequent partners to improve everyone’s 3-high architecture, design, and development levels.

If you need the PDFs of “Nien Architecture Notes”, “Nien High Concurrency Trilogy” and “Nien Java Interview Guide”, please follow this public account [Technical Freedom Circle] to obtain them

Article directory

    • Said before:
    • Three ways to optimize queries that are too slow:
    • First, analyze the root cause of the slowness. It turns out that Sharding-JDBC has made paging corrections.
      • Conflict between function and performance: performance bottleneck starting from 0
      • Sharding-JDBC optimization
    • Disadvantages of streaming queries
    • Optimization plan 1: Use ES search in combination for paging,
    • Optimization plan 2: prohibit page jump query method
    • Optimization method 3: secondary query method
    • Say it at the end
    • Recommended reading

Three ways to optimize queries that are too slow:

After the database is divided into tables, the query is too slow. How to optimize it? There are roughly three methods:

  • Method 1: Combined use of ES search for paging,
  • Method 2: Prohibit page jumps
  • Method 3: Secondary query method

The first method requires the introduction of es, which will not be expanded here. Those who are interested can come to Nien’s “Technical Freedom Circle” community to communicate.

Let’s sort out the second method of prohibiting page jumps and the third method of secondary query.

First of all, analyze the root cause of the slowness. It turns out that Sharding-JDBC has made paging corrections

Sharding-JDBC obtains paging data from multiple databases, which is different from the single database scenario.

Assume that every 10 pieces of data is one page, and take the data on page 2.

It is incorrect to obtain LIMIT 10, 10 in a sharding environment, and then retrieve the first 10 pieces of data based on the sorting conditions after merging.

For example, if the SQL is:

SELECT score FROM t_score ORDER BY score DESC LIMIT 1, 2;

The following figure shows the paging execution results without SQL rewriting:

As shown in the figure, if you want to obtain the common 2nd and 3rd data in the two tables sorted by score, theoretically, they should be 95 and 90.

In reality?

Since the executed SQL can only obtain the second and third pieces of data from each table, that is, 90 and 80 are obtained from the t_score_0 table;

What is obtained from the t_score_0 table is 85 and 75.

Therefore, when merging the results, you can only merge the obtained 90, 80, 85 and 75. Then, no matter how you implement it, it is impossible to obtain the correct result after the results are merged.

The correct approach is to rewrite the paging condition to LIMIT 0, 3, take out all the first two pages of data, and then combine the sorting conditions to calculate the correct data. The figure below shows the paging execution results after SQL rewriting.

Conflict between function and performance: performance bottleneck starting from 0

Note, there is a big problem here:

In order to avoid errors in the results, the queries before merging start with 0, so that the results may be correct.

However, querying paging with an excessively large offset will result in poor data retrieval performance from the database.

Take MySQL as an example:

If it were not for database and table partitioning, this SQL would cause MySQL to skip 1,000,000 records and then obtain 10 records without using the index. Its performance can be imagined.

However, in the case of separate databases and tables (assuming it is divided into 2 databases), in order to ensure the correctness of the data, SQL will be rewritten as:

That is, all records before the offset are taken out, and only the last 10 records after sorting are obtained.

This will further exacerbate the performance bottleneck when the database itself performs very slowly.

Because the original SQL only needs to transmit 10 records to the client, the rewritten SQL will transmit 1,000,010 * 2 records to the client.

Optimization of Sharding-JDBC

(1)Use streaming processing + merge sorting to avoid excessive memory usage.

Since SQL rewriting inevitably takes up additional bandwidth, it will not cause memory to explode.

Contrary to intuition, most people think that Sharding-JDBC will load all 1,000,010 * 2 records into memory, which will occupy a lot of memory and cause memory overflow.

However, since the records of each result set are ordered, Sharding-JDBC only obtains the current result set records of each shard each time, and the records resident in the memory are only the current result set of the shard that is currently routed to. The cursor is just pointing. For objects to be sorted that are themselves ordered, the time complexity of merge sort is only O(n), and the performance loss is very small.

(2)Sharding-JDBC further optimizes queries that only fall into multiple shards.

Requests falling into single-shard queries do not need to rewrite SQL to ensure the correctness of the records. Therefore, in this case, Sharding-JDBC does not rewrite SQL, thus achieving the purpose of saving bandwidth.

Generally speaking, slow performance is the first case. Streaming queries look great, but they also have major drawbacks.

Disadvantages of streaming queries

The disadvantages of using cursor query are obvious.

Note for streaming (cursor) queries: the current query will exclusively occupy the connection! All rows in the result set must be read (or closed) before any other queries can be issued on the connection, otherwise an exception will be thrown!

After executing a streaming query, the database access framework is not responsible for closing the database connection. The application needs to close it itself after fetching the data.

Since MySQL_Server does not know when the client has finished consuming the data, and its own corresponding table may have DML writing operations, at this time MySQL_Server needs to create a temporary space to store the data that needs to be taken away.

Therefore, when you enable useCursorFetch to read a large table, you will see several phenomena on MySQL:

  1. IOPS soars because the data that needs to be returned needs to be written to temporary space, and there is a large amount of IO reading and writing. This process may cause write jitter in other businesses.
  2. Disk space soars, and the data written to the temporary space will be recycled by MySQL_Server when the read is completed or the client initiates a ResultSet#close operation.
  3. When the client JDBC initiates sql_query, there may be a long wait. During this time, MySQL_Server prepares the data. However, the waiting time of ordinary queries and the waiting time of cursor queries are inconsistent in principle: the former is reading data from the network buffer and does not respond to the business level; the latter is when MySQL is preparing temporary data space and does not respond to JDBC.
  4. After the data preparation is completed and the data transmission stage proceeds, the network response begins to surge, and the IOPS changes from “write” to “read”

This is the reason why many friends in the Nien community have reported that paging queries are slow after table splitting. Basically, the larger the page size, the more resources will be consumed by subsequent queries.

How to solve it?

Optimization plan 1: Combined use of ES search for paging,

The first method requires the introduction of es, which will not be expanded here. Those who are interested can come to Nien’s “Technical Freedom Circle” community to communicate.

Optimization plan 2: prohibit page jump query method

Since LIMIT cannot query data through indexes, if the continuity of IDs can be guaranteed, paging through IDs is a better solution:

Or query the next page by recording the ID of the last record of the last query result:

If it is not the id column, assuming the sorted column is col, the two steps of prohibiting page jump query are roughly as follows:

(1) Use the normal method to obtain the first page of data and obtain the maximum value of max_col recorded on the first page;

(2) Each time the page is turned, the

order by col offset X limit Y;

rewritten as

order by col where col>$time_max limit Y;

To ensure that only one page of data is returned each time, the performance is constant.

Optimization method 3: secondary query method

Assuming that the sorted column is col, the two steps of the secondary query method are roughly as follows:

(1) Rewrite SQL and change

order by col offset X limit Y;

rewritten as

order by col offset X/N limit Y;

(2) Return multiple pages and find the minimum value col_min;

(3) between secondary query

order by col between col_min and col_i_max;

(4) Set virtual col_min, find the offset of col_min in each sub-library, and thereby obtain the global offset of col_min;

(5) After getting the global offset of col_min, we naturally get the global offset X limit Y;

Example: sub-table structure

CREATE TABLE `student_time_0` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(11) NOT NULL,
  `name` varchar(200) COLLATE utf8_bin DEFAULT NULL,
  `age` tinyint(3) unsigned DEFAULT NULL,
  `create_time` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=674 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

There are three tables like this, student_time_0, student_time_1, student_time_2, with user_id as the table partitioning key and modulo the number of tables as the table partitioning basis.
Here we first construct the point data,

insert into student_time (`name`, `user_id`, `age`, `create_time`) values (?, ?, ?, ?)

Mainly to ensure that create_time is unique and to better explain the problem.

int i = 0;
try (
    Connection conn = dataSource.getConnection();
    PreparedStatement ps = conn.prepareStatement(insertSql)) {<!-- -->
    do {<!-- -->
        ps.setString(1, localName + new Random().nextInt(100));
        ps.setLong(2, 10086L + (new Random().nextInt(100)));
        ps.setInt(3, 18);
        ps.setLong(4, new Date().getTime());


        int result = ps.executeUpdate();
        LOGGER.info("current execute result: {}", result);
        Thread.sleep(new Random().nextInt(100));
        i + + ;
    } while (i <= 2000);

The data in the three tables are 673, 678, and 650 respectively. The data in each table is different.

Next, make a paging query like this

select * from student_time ORDER BY create_time ASC limit 1000, 5;

student_time is of course a logical table for the sharding-jdbc we use, and sharding-jdbc will be rewritten as

select * from student_time ORDER BY create_time ASC limit 0, 1005;

Even if sharding-jdbc is better at optimizing merge sorting, it still needs to transmit such a large amount of data, and the query is also time-consuming. So is there any solution?

  • The first method is to prohibit page jumps, but only to the next page. Then we can record the create_time of the previous maximum offset, and the next page can use this offset to query
  • The second method is the secondary query method

The first step of this method is the same as the previous wrong or inaccurate method. First, average the paging offset, 333, and query in three tables based on this limit 333,5

t0
334 10158 nick95 18 1641548941767
335 10098 nick11 18 1641548941879
336 10167 nick51 18 1641548942089
337 10167 nick3 18 1641548942119
338 10170 nick57 18 1641548942169


t1
334 10105 nick98 18 1641548939071 min
335 10174 nick94 18 1641548939377
336 10129 nick85 18 1641548939442
337 10141 nick84 18 1641548939480
338 10096 nick74 18 1641548939668

t2
334 10184 nick11 18 1641548945075
335 10109 nick93 18 1641548945382
336 10181 nick41 18 1641548945583
337 10130 nick80 18 1641548945993
338 10184 nick19 18 1641548946294 max

What is the goal of the first pass? Find out the smallest create_time and the largest create_time, and then query the three tables.

In fact, it is mainly the minimum value, because if I check it with the minimum value, I will be able to know where the minimum value is in each table.

order by col between col_min and col_i_max;

Then I also want to get this condition here, so I find the smallest create_time and the largest create_time found in the first pass, and then query in three tables.

Here, the main thing is actually the minimum value, because after I check the minimum value, I can know where the minimum value is in each table.

t0
322 10161 nick81 18 1641548939284
323 10113 nick16 18 1641548939393
324 10110 nick56 18 1641548939577
325 10116 nick69 18 1641548939588
326 10173 nick51 18 1641548939646

t1
334 10105 nick98 18 1641548939071
335 10174 nick94 18 1641548939377
336 10129 nick85 18 1641548939442
337 10141 nick84 18 1641548939480
338 10096 nick74 18 1641548939668

t2
297 10136 nick28 18 1641548939161
298 10142 nick68 18 1641548939177
299 10124 nick41 18 1641548939237
300 10148 nick87 18 1641548939510
301 10169 nick23 18 1641548939715

I only posted the first five pieces of data. In order to easily know the offset, each table uses an auto-incrementing primary key.

We can see that the minimum values of the previous query are located at 322-1 and 297-1 in the other two tables respectively.

So, overall, the starting position of this time should be at 322 - 1 + 334-1 + 297 - 1 = 951,

Then, as long as the following data is checked for up to 1000 - 951 + 5 = 54 pieces of data in each table, and then merged and sorted, the final correct result can be obtained.

This is the secondary query method.

It can be seen that the secondary query method is very troublesome. It is better to prohibit page jumps or the es combination method, which is direct and effective.

Say it at the end

Sub-library and sub-surface test questions are very common interview questions.

If everyone can answer the above content fluently and thoroughly, the interviewer will basically be shocked and attracted by you.

Before the interview, it is recommended that you systematically review the 5,000-page “Nien Java Interview Guide PDF”.

This book has been continuously updated and added: the latest real questions and the latest difficult problems. The latest version can be obtained from the 40-year-old architect Nien.

And if you have any questions during the question-answering process, you can come and talk to Nien, a 40-year-old architect.

In the end, the interviewer loved it so much that he “can’t help himself and his mouth watered”.

The offer is coming.

Recommended reading

“Ten billions of visits, how to design a cache architecture”

“Multi-level cache architecture design”

“Message Push Architecture Design”

“Alibaba 2: How many nodes do you deploy?” How to deploy 1000W concurrency? 》

“Meituan 2 Sides: Five Nines High Availability 99.999%. How to achieve it?” 》

“NetEase side: Single node 2000Wtps, how does Kafka do it?” 》

“Byte Side: What is the relationship between transaction compensation and transaction retry?” 》

“NetEase side: 25Wqps high throughput writing Mysql, 100W data is written in 4 seconds, how to achieve it?” 》

“How to structure billion-level short videos?” 》

“Blow up, rely on “bragging” to get through JD.com, monthly salary 40K”

“Too fierce, I rely on “bragging” to get through SF Express, and my monthly salary is 30K”

“It exploded… Jingdong asked for 40 questions on one side, and after passing it, it was 500,000 +”

“It’s so confusing to ask…Ali asked 27 questions while asking for his life, and if he passed, it would be 600,000 +”

“After 3 hours of crazy asking on Baidu, I got an offer from a big company. This guy is so cruel!” 》

“Ele.me is too cruel: Face an advanced Java, how hard and cruel work it is”

“After asking Byte for an hour, the guy got the offer. It’s so cruel!” 》

“Accept Didi Offer: From three experiences as a young man, see what you need to learn?” 》

“Nien Architecture Notes”, “Nien High Concurrency Trilogy”, “Nien Java Interview Guide” PDF, please go to the following public account [Technical Freedom Circle] to get ↓↓↓