Thoughts on MySQL optimization two [Performance analysis tools, optimization principles]

In actual work, we inevitably need to analyze and optimize SQL predictions. Today we will take a look at the relevant content:

  • SQL performance analysis

  • SQL optimization principles

1 SQL performance analysis

Performance analysis of SQL mainly includes:

  • View slow SQL

  • View through profile details

  • explain execution plan

1.1 View slow SQL

SQL execution frequency

-- View the status of the system, 7_
show global status like 'Com_______';

Slow query log

The slow query log records the logs of all SQL statements whose execution time exceeds the specified parameter (long_query_time, unit: seconds, default is 10 seconds). MySQL’s slow query log is not enabled by default and needs to be added in the configuration file.

-- Check whether the slow query log is enabled
show variables like 'show_query_log';

Enable slow query configuration and add: in the configuration file:

# Turn on the slow SQL switch
slow_query_log=1

#Set the slow log time to 2 seconds. If the SQL statement execution time exceeds 2 seconds, it will be recorded.
long_query_time=2

The information recorded in the slow query log is in /var/lib/mysql/localhost-slow.log. The address of MySQL started by docker is different from the above:

Picture

Run SQL, create slow SQL

select sleep(5);

Check the slow SQL log:

[root@VM-24-10-centos data]# cat a5cbd50ec8f4-slow.log
/usr/sbin/mysqld, Version: 8.0.27 (MySQL Community Server - GPL). started with:
Tcp port: 3306 Unix socket: /var/run/mysqld/mysqld.sock
Time Id Command Argument
# Time: 2023-07-27T09:03:10.516956Z
# User@Host: root[root] @ [114.242.22.4] Id: 9
# Query_time: 5.000289 Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 1
use test;
SET timestamp=1690448585;
/* ApplicationName=DataGrip 2018.1.3 */ select sleep(5);

1.2 View through profile details

Check the recently executed SQL through profile

-- Check whether the profile supports it
select @@have_profiling;

-- Check whether the profile is enabled
select @@profiling;


-- Settings enabled
set profiling = 1;

-- Inquire
select *
from test.user;

-- Inquire
select *
from test.user where name ='Zhang San';

select count(*) from user;

--The maximum value range of profiling_history_size is [0,100]
set profiling_history_size = 100;

-- View all currently executed SQL
show profiles;

-- Query SQL execution details through Query_ID
show profile for query 455;

1.3 explain execution plan view

The explain or desc command obtains information about how MySQL executes the select statement, including how tables are connected and the order in which they are connected during the execution of the select statement.

explain select * from test.user where name ='Zhang San';

Picture

The meaning of each field

  • id, view the execution order of each clause of the query

    • The sequence number of the select query indicates the order in which select clauses or operation tables are executed in the query (with the same ID, the execution order is from top to bottom; if the ID is different, the larger the ID, the earlier it is executed).

  • select_type, check the query type, generally don’t pay attention to it

    • Indicates the type of select, simple (simple table, that is, no table connection or subquery is used), primary (main query, that is, the outer query), union (the second or subsequent query statement in the union), subquery (select Subqueries are included after /where)

  • type, connection type, requires focus

    • Indicates the connection type. The connection types from good to poor performance are null, system, const, eq_ref, ref, range, index, and all.

  • possible_key

    • Displays the indexes that may be applied to this table, one or more

  • Key, the index actually used, needs to focus on

    • All actually used, if null then no index is used

  • key_len, all bytes actually used, need to focus on

    • Indicates the number of bytes used in the index. This value is the maximum possible length of the index field, not the actual length used. Without losing accuracy, the shorter the length, the better.

  • rows

    • The number of rows that MySQL thinks must be executed for the query is an estimate in InnoDB. The smaller the value, the better.

  • filtered, requires focus

    • Indicates the number of rows returned as a percentage of the number of rows to be read. The larger the value, the better.

  • Extra, additional instructions for the execution plan, containing important information

id

type value

Meaning

1

const row not found

The table to be queried is empty

2

Distinct

MySQL is querying distinct values, so when it finds a distinct value, it will stop the search for the current group and query the next value.

3

Impossible WHERE

The where condition is always false and there are no records in the table that satisfy the condition.

4

Impossible WHERE noticed after reading const tables

After the optimizer evaluated the const table, it found that none of the where conditions were satisfied.

5

no matching row in const table

The current join table is a const table and cannot be matched.

6

Not exists

The optimizer finds that the internal table record cannot satisfy the where condition

7

Select tables optimized away

When there is no group by clause, for MyISAM’s select count(*) operation, or when index optimization can be used for min(), max() operations, the optimizer finds that only one row will be returned.

8

Using filesort

Use filesort to perform order by operation

9

Using index

covering index

10

Using index for group-by

For group by columns or distinct columns, you can use the index to retrieve data without having to look up data in the table, group, sort, deduplicate, etc.

11

Using join buffer

The previous table connection is put into the join buffer after the nested loop, and then joined with this table. The access type applicable to this table is range, index or all

12

Using sort_union,using union,using intersect

Three situations of index_merge

13

Using temporary

A temporary table is used to store the intermediate result set, which is suitable for group by, distinct, or order by columns of different tables.

14

Using where

After the records are retrieved at the storage engine layer, the server uses the where condition to filter and return them to the client.

2 SQL optimization principles

2.1 Insert data

2.1.1 insert optimization

  • Insert in batches to reduce connection establishment. It is not recommended to insert more than 1000 items in batches

  • Submit the transaction manually. Before executing the insert, open the transaction. After inserting the data, commit manually.

  • Primary key order insertion

insert into user values(1,'Xiaomi'),(2,'child'),(3,'小红');

-- Start transaction
-- You can also use begin
start transaction;

-- insert operation
insert into user values(1,'Xiaomi'),(2,'child'),(3,'Xiaohong');
insert into user values(4,'Xiaomi1'),(5,'Kid1'),(6,'Xiaohong1');

-- submit
commit;

2.1.2 Insert data in bulk

If you need to insert a large amount of data at one time, use the load command to insert it.

Picture

Create a new database table and prepare to import and export data to the table

drop table if exists user;
create table user (
  id int primary key auto_increment,
  name varchar(20) not null,
  age int not null,
  city varchar(20) default null
);

Use Java code to generate batch data files that need to be imported

package com.wmding;

import java.io.*;
import java.util.ArrayList;
import java.util.Random;

/**
 * @author 明月
 * @version 1.0
 * @date 7/29/23 9:50 AM
 * @description:
 */
public class WriteFile {
    public static void main(String[] args) throws IOException {

        ArrayList cityList = new ArrayList();
        cityList.add("Beijing");
        cityList.add("Shanghai");
        cityList.add("Guangzhou");
        cityList.add("Zhengzhou");

        File file = new File("/Users/wmding/WorkSpaces/My/ThreadDemo/src/main/java/com/wmding/data.log");

        if (!file.exists()) {
            file.createNewFile();
        } else {
            file.delete();
            file.createNewFile();
        }

        FileWriter fileWriter = new FileWriter(file);

        int num = 1000000;
// int num = 10;
        // 1,sexe,10,sexeee
        for (int i = 1; i <= num; i + + ) {
            StringBuffer sb = new StringBuffer();
            sb.append(i);
            sb.append(',');

            String randomStr1 = createRandomStr1(6);
            sb.append(randomStr1);
            sb.append(',');

            int age = creatRandom();
            sb.append(age);
            sb.append(',');

            int index = i % 2;
            sb.append(cityList.get(index));

            System.out.println(sb.toString());

            fileWriter.write(sb.toString());

            // Add a new line after writing the content
            fileWriter.write("\
");

        }
        fileWriter.close();


    }


    /**
     * Each position of the generated string may be a letter or number in str
     */
    private static String createRandomStr1(int length) {
        String str = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
        Random random = new Random();
        StringBuffer stringBuffer = new StringBuffer();
        for (int i = 0; i < length; i + + ) {
            int number = random.nextInt(62);
            stringBuffer.append(str.charAt(number));
        }
        return stringBuffer.toString();
    }

    private static int creatRandom() {
        Random random = new Random();
        // Generate a random number greater than 1 and less than 100
        int randomNumber = random.nextInt(98) + 2;
        return randomNumber;
    }
}

Upload data files to the server

Copy the data file to the /test directory inside the MySQL container
docker cp data.log a5cbd50ec8f4:/test/data.log

Enter inside the MySQL container
docker exec -it 5d5ab079a223 bash

Importing data, importing 1 million pieces of data, takes 5.64 seconds

--Use --local-infile mode to enter the database
mysql -uroot -p --local-infile

-- Set local_infile to 1
mysql> set global local_infile = 1;

-- Import Data
mysql> load data local infile '/test/data.log' into table user fields terminated by ',' lines terminated by '\
';
Query OK, 1000000 rows affected (5.64 sec)
Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0

2.2 Primary key optimization

2.2.1 How data is organized

In the InnoDB storage engine, table data is organized and stored according to primary key order. Tables stored in this way are called index organization tables (IOT).

  • Non-leaf nodes store indexes

  • Leaf nodes store indexes and data

Picture

2.2.2 Page split

Different insertion methods lead to different efficiencies

  • When the primary key is inserted sequentially, the primary key is placed in the page in order.

  • When the primary key is inserted out of order, because the order is uncertain, the appropriate position will be found first and then inserted during the insertion process. The original inserted data is page split

2.2.3 Page merging

When a row is deleted, the record is not physically deleted, it is just marked for deletion and its space becomes available.

When the deleted records in the page reach merge_threshold (default is 50% of the page), InnoDB will merge the page to optimize space usage.

2.4 Design principles of primary keys

  • When meeting business needs, try to reduce the length of the primary key and reduce disk IO during query

  • When inserting data, try to insert sequentially and use auto_increment to increase the primary key to avoid page splits.

  • Try not to use UUID as the primary key or use other natural primary keys, such as identity numbers.

  • During business operations, avoid modification of primary keys. After modifying the primary key, the index structure needs to change

2.3 order by optimization

2.3.1 Sorting method

There are 2 sorting methods in order by sorting: Using index > Using filesort

  • Using filesort: Read the data rows that meet the conditions through the table index or full table scan, and then complete the sorting operation in the sort buffer. All sorting that does not directly return the sorting results through the index is called FileSort sorting.

  • Using index: Ordered data is directly returned through ordered index sequential scanning. This situation is called using index. No additional sorting is required, and the operation efficiency is high.

#When no index is created, sort according to age and phone
explain select id,age,phone from tb_user order by age, phone;

#Create index

create index idx_user_age_phone_aa on tb_user(age, phone);

#After creating the index, sort in ascending order according to age and phone
explain select id,age,phone from tb_user order by age, phone;

#After creating the index, sort in descending order according to age and phone
explain select id,age,phone from tb_user order by age desc, phone desc;

-- The joint index does not work when one is in ascending order and the other is in descending order.
#According to age, phone, descending order, one ascending order, one descending order
explain select id,age,phone from tb_user order by age asc,phone desc;

--The solution is to re-create the indexes, one in ascending order and one in descending order.
#Create index
create index idx_ user_age_phone_ad on tb_user(age asc,phone desc);

#According to age, phone, descending order, one ascending order, one descending order
explain select id,age,phone from tb_user order by age asc, phone desc;

3.2 Sorting Optimization

  • Establish an appropriate index based on the sorting field. When sorting on multiple fields, the leftmost prefix rule must also be followed.

  • Try to use covering indexes

  • Multi-field sorting, one in ascending order and one in descending order. At this time, you need to pay attention to the rules for creating joint indexes (asc\desc)

  • If filesort is unavoidable, the sort buffer can be appropriately increased when sorting large amounts of data. If it exceeds 256K, disk files may be involved.

-- Check the sort buffer size, the default is 256K
show variables like 'sort_buffer_size';

2.4 group by optimization

create index index_age_city on user(age,city);

-- Extra: Using index
explain select age,count(*) from test.user group by age;

-- Extra: Using index; Using temporary
explain select city,count(*) from test.user group by city;

Optimization principles:

  • When performing grouping operations, indexing can be used to improve efficiency.

  • When performing grouping operations, the use of indexes also satisfies the leftmost prefix rule.

2.5 limit optimization

-- When the amount of data in the user table is large. The speed of limit starting from 0 and limit starting from 990900 is very different.
select * from test.user limit 0, 10;

select * from test.user limit 990900, 10;

-- Subqueries can be used for optimization
select s.* from test.user as s, (select id from test.user order by id limit 909000,10) as a where s.id = a.id;

When performing general paging queries, the optimization ideas are as follows:

  • Improve execution efficiency by creating covering indexes

  • Optimize using covering index + subquery

2.6 count optimization

Count is an aggregation function. For the returned result set, it is judged whether it is null in the same row. If it is not null, the cumulative value is increased by 1, and finally the cumulative value is returned.

Usage is:

  • count(*)

  • count(primary key)

  • count(field)

  • count(1)

Picture

Use approximate counting: If you are not strict about the exact number of rows, you can use the approximate counting method. MySQL provides a command called SHOW TABLE STATUS, which can return metadata information about the table, including an approximate number of data rows.

show table status like 'user';

2.7 update optimization

InnoDB uses row locks, which are locks added to the index. The index cannot be invalidated. After the invalidation, the row lock will be upgraded to a table lock.

Follow me and let’s learn together.