SQL optimization for database and cache

Mind Map

1. Common SQL optimization methods

1. Avoid using SELECT *

  • will consume more cpu
  • Useless fields increase network bandwidth resource consumption and increase data transmission time, especially for large fields (such as
    varchar, blob, text)
  • Cannot use the MySQL optimizer to cover index optimization (based on the MySQL optimizer’s “covering index
    “Introduction” strategy is extremely fast and efficient, and is highly recommended query optimization method in the industry)

About covering index links:
Mysql performance optimization: why use a covering index?

2. Page optimization

Pagination takes a short time when the amount of data is small, but it takes a long time for millions or even tens of millions of data.

2.1 Change the pagination into a subquery

// Millions of data query
SELECT `score`,`name` FROM `cus_order` ORDER BY `score` DESC LIMIT 1000000, 10

// Subquery: We first query the primary key value corresponding to the first parameter of limit, and then filter and limit according to the primary key value.
The rate will be faster (Alibaba specification)
SELECT `score`,`name` FROM `cus_order` WHERE id >= (SELECT id FROM `cus_order` LIMIT 1000000,1) LIMIT 10

Remarks: The result of a subquery will generate a new table, which will affect performance. Try to avoid using subqueries extensively.

2.1 Delay query

We first extract the corresponding primary key, and then associate this primary key table with the original data table:

SELECT `score`,`name` FROM `cus_order` a, (SELECT id from `cus_order` ORDER BY `score` DESC LIMIT 1000000, 10) b where a.id = b.id

Related detailed links are as follows:

Interviewer: 10 million data, how to quickly query?

[Dewu Technology] MySQL Deep Paging Optimization

3. Try to avoid multi-table join

3.1 Reason

Join is relatively inefficient because it uses nested loops (Nested Loop) to implement associated queries, three
None of the different implementations is very efficient:

  • Simple Nested-Loop Join: directly use Cartesian product to implement join, traverse/full line by line
    Table scan, the least efficient.
  • Block Nested-Loop Join: Optimized with JOIN BUFFER, the performance has been improved. However, if the data in the two tables is too large, the performance improvement is very limited.
  • Index Nested-Loop Join: Add indexes on necessary fields so that they can be used in the Join process
    An index, so that Block Nested-Loop Join can be converted to Index Nested-Loop Join, the performance is better
    to further improvement.

3.2 Resolution

3.2.1 Single table query + code assembly

You can use: Stream lambda + mybatis plus + lombok

This method is recommended because:

  • Single-table query is more conducive to maintenance after subsequent business changes.
  • High code reusability
  • The greater the amount of data, the greater the efficiency improvement
    The larger the amount of data, the geometrically increase the join query time. If the index design is unreasonable, it may even drag down the database
  • Reduce queries for redundant fields
    For data joins that only need to be queried once, multiple queries will increase network and memory overhead
  • Higher cache utilization
    For queries that have not changed, they can be cached and used
  • The middleware is not friendly to Join after the database is divided into databases and tables

MySQL multi-table association query is more efficient or multiple single-table query is more efficient, why?

3.2.2 Data redundancy

When the table structure is relatively stable, make some important data redundant in the table, and avoid associative queries as much as possible.

4. It is recommended not to use foreign keys and cascading

  • Not suitable for distributed, high concurrency
  • Strong blocking during cascading updates, there is a risk of database update storms
  • Foreign keys affect the insertion speed of the database

5. Select the appropriate field type

The smaller the storage byte, the smaller the space occupied and the better the performance.

5.1 Certain character strings can be converted to digital storage

For example, IP addresses can be converted into integer data.

MySQL provides two methods to handle ip addresses

  • INET_ATON() :
    Before inserting data, convert ip to unsigned integer (4-8 bits)

  • INET_NTOA() :
    After querying the data, convert the integer ip to an address

5.2 Non-negative data is preferentially stored in unsigned integer.

Such as auto-increment ID, integer IP, age

Because unsigned can double the storage space compared to signed

5.3 The decimal value type is preferred to use the TINYINT type.

For example, age and status are represented as 0/1

5.4 It is recommended to use Timestamp for the date type.

Because the DateTime type consumes more space and has no time zone information,

5.5 Use decimal for the amount field to avoid loss of precision

5.6 Try to use auto-increment id as the primary key

  • If the primary key is an auto-increment id, the data will be added to the end of the B + tree each time (essentially a doubly linked list), and the time complexity is O(1).
    When a data page is full, just apply for another new data page and continue writing.

  • If the primary key is a non-auto-incrementing id, in order to keep the leaf nodes of the B + tree in order after adding new data, it needs to search in the middle of the leaf nodes, and the time complexity of the search process is O(lgn).
    If this is also full, page splitting is required. Page splitting operations require pessimistic locks, and the performance is very low.

Remarks: It is not recommended to use self-incrementing id as the primary key for sub-database and sub-table, and distributed ID such as uuid should be used.

Does the database primary key have to be auto-incremented? In what scenarios is auto-increment not recommended?

6. Try to use UNION ALL instead of UNION

Unless the data needs to be deduplicated

  • UNION will put all the data of the two result sets into a temporary table and then perform the deduplication operation, which is more time-consuming and consumes CPU resources.

  • UNION ALL will no longer deduplicate the result set, and the obtained data contains duplicate items.

7. Batch operation

Data updates in the database use batch operations as much as possible to reduce the number of requests to the database and improve performance.

8. Show Profile to analyze SQL execution performance

Displays the resource usage of SQL statements: CPU usage, CPU context switching, IO waiting, memory usage, etc.

// Query whether the function exists and is opened
SHOW VARIABLES LIKE '%profiling%'
or:
SELECT @@profiling

// set open
SET @@profiling=1

// Set the number of SQL statements to display (set 100 as follows)

SET @@profiling_history_size=100

// Query the QUERY_ID of all SQL
SHOW PROFILE

// According to the SQL statement performance corresponding to QUERY_ID (8 below)
SHOW PROFILE CPU, IPC, MEMORY FOR QUERY 8;

9. Optimize slow SQL

9.1 Open and set MySQL slow query log

# Turn on the slow query log function
SET GLOBAL slow_query_log = 'ON';
# Slow query log storage location
SET GLOBAL slow_query_log_file = '/var/lib/mysql/ranking-list-slow.log';
# Regardless of timeout or not, records that are not indexed will also be recorded.
SET GLOBAL log_queries_not_using_indexes = 'ON';
# Slow query threshold (seconds), SQL execution exceeding this threshold will be recorded in the log.
SET SESSION long_query_time = 1;
# Slow query only records the SQL with the number of scanned rows greater than this parameter
SET SESSION min_examined_row_limit = 100;

Remarks: Set up and verify the following through the following command:

show variables like 'slow%'

9.2 Query MySQL slow query log

cat /var/lib/mysql/ranking-list-slow.log

Note: Make sure you have access to the corresponding directory:

chmod 755 /var/lib/mysql/

9.3 Description of slow query log information (including SQL statements)

Time : The running time of the logged code on the server.
User@Host : Who executed this code.
Query_time : How long this code took to run.
Lock_time: When this code is executed, how long has it been locked.
Rows_sent : The records returned by the slow query.
Rows_examined : The number of rows scanned by the slow query.

9.4 EXPLAIN command analyzes the corresponding SELECT statement

Analysis information:

select_type : The type of query, commonly used values are SIMPLE (ordinary query, that is, there is no joint query,
subquery), PRIMARY (primary query), UNION (subquery within a UNION), SUBQUERY (subquery
query), etc.
table : Indicates the table or derived table involved in the query.
type : Execution mode, an important reference index to judge whether the query is efficient, and the result values are in order from worst to best: ALL
< index < range ~ index_merge < ref < eq_ref < const < system.
rows : The number of data rows that SQL needs to scan and read to find the result set. In principle, the fewer rows, the better.

See my other article about indexing:
MySQL index for database and cache

And the following:
MySQL Performance Optimization Artifact Explain Usage Analysis

10. Correct use of indexes

The correct use of indexes can greatly speed up the retrieval of data (significantly reduce the amount of data retrieved)

10.1 Select the appropriate field to create an index

  1. Fields that are not NULL

The data of the index field should not be NULL as far as possible, because for words whose data is NULL
segment, the database is more difficult to optimize. If the field is frequently queried, but cannot avoid being NULL, it is recommended to use
0,1,true,false such short values or short characters with clearer semantics are used as substitutes.

  1. Frequently queried fields

The fields we create indexes should be fields that are frequently queried.

  1. Fields queried as conditions

Fields queried as WHERE conditions should be considered for indexing.

  1. Fields that frequently need to be sorted

The index has been sorted, so that the query can use the sorting of the index to speed up the sorting query
between.

  1. Fields that are frequently used for joins

Fields that are often used for joins may be some foreign key columns, and foreign key columns are not
Be sure to establish a foreign key, just to say that the column involves a table-to-table relationship. For fields that are frequently queried by joins, you can
Consider building an index to improve the efficiency of multi-table join queries.

10.2 Frequently updated fields should be carefully indexed

10.3 Consider building a joint index instead of a single column index as much as possible

  • Indexes need to occupy disk space, which can be simply understood as each index corresponds to a B + tree.
  • If there are too many indexes in a table, when the amount of data in this table is large, the index will occupy a lot of space; and when modifying the index, it will take a lot of time.
  • If it is a joint index and multiple fields are on one index, it will save a lot of disk space, and the operation efficiency of modifying data will also be improved.

10.4 Pay attention to avoid redundant indexes

  • Redundant index means that the functions of the index are the same, if index (a, b) can be hit, index (a) can be hit, then index (a) is
    is a redundant index.
    Such as (name, city) and (name) these two indexes are redundant indexes. A query that hits the former is sure to hit the latter.
  • In most cases, you should try to extend existing indexes rather than create new ones.

10.5 Consider using prefix indexes instead of ordinary indexes on fields of string type

Prefix indexes are limited to string types, which take up less space than ordinary indexes, so you can consider using prefix indexes instead of ordinary indexes

10.6 Avoid index failure

  1. Index failure is also one of the main reasons for slow queries. The common situations that lead to index failure are as follows:
  2. Use SELECT * to query;
  3. A composite index is created, but the query condition does not comply with the leftmost matching principle;
  4. Perform operations such as calculations, functions, and type conversions on indexed columns;
  5. LIKE queries starting with % such as like ‘?c’; ;
  6. If or is used in the query condition, and there is no index for a column in the pre- and post-condition of or, the involved indexes will not be used;
  7. an implicit conversion occurs

Remarks: MySQL implicit conversion causes index invalidation

10.7 Delete long-term unused indexes

Delete indexes that have not been used for a long time. The existence of unused indexes will cause unnecessary performance loss. After MySQL 5.7, you can query which indexes have never been used by querying the schema_unused_indexes view of the sys library

The main reference link of this article is as follows:

Reference link 1-JavaGuide


Updating continuously…

Follow your heart and see the future. Follow your heart, see light!

Welcome to like, follow, and leave a message, let’s learn and communicate together!