[MySQL Indexing and Optimization] Use of database optimization and performance analysis tools

Using database optimization and performance analysis tools

Article directory

  • Use of database optimization and performance analysis tools
    • 1. Steps for database server optimization
    • 2. Query system performance parameters
    • 3. Locate the SQL that is fully executed: slow query log
    • 4. View SQL execution costs: show profile
    • 5. Analysis query tool: explain
      • 5.1id
      • 5.2 type
    • 6. explain further use
      • 6.1 explain supports 4 output formats
      • 6.2 show warnings
    • 7. Analyze the optimizer execution plan: trace
    • 8. MySQL monitoring analysis view: sys_schema
      • 8.1 sys schema view summary
      • 8.2 sys schema usage scenarios
        • Index status
        • table related
        • Statement related
        • IO related
        • Innodb related

1. Steps for database server optimization

  1. Observe whether there are periodic fluctuations in the server, such as Taobao Double 11 and other periodic activities. At this time, you can consider adding cache and changing the cache invalidation policy to solve the problem.
  2. Enable slow query logging and use SHOW PROFILING and EXPLAIN analysis to see whether the SQL waiting time is long or the SQL execution time is long
    • SHOW PROFILING shows a long waiting time, then tune server parameters
    • If SHOW PROFILING shows that the execution time is long, then combined with the EXPLAIN tool analysis, consider optimizing from the following three aspects.
      1. Index design optimization
      2. There are too many JOIN tables and need to be optimized
      3. Database table design optimization
  3. Consider whether the SQL query has reached the bottleneck. If it has not reached the bottleneck, recheck the SQL or perform the above analysis; if it has reached the bottleneck, consider the following two steps.
    1. Read and write separation (master-slave architecture)
    2. Sub-database and table (vertical sub-database, vertical sub-table, horizontal sub-table)

Database tuning dimensions:

  • The index is invalid and the index is not fully utilized–Index creation
  • There are too many JOINs in related queries (design defects or last resort requirements) – SQL Optimization
  • Server tuning and various parameter settings (buffering, number of threads, etc.) – Adjust my.cnf
  • Too much data–Sub-database and sub-table

2. Query system performance parameters

In MySQL, you can use the show status statement to query the performance parameters and execution frequency of the database server.

show [global|session] status like 'parameter';
  • Connections: Number of connections to the MySQL server
  • Uptime: The server’s online time, in seconds
  • Slow_queries: Number of slow queries
  • Innodb_rows_read/Innodb_rows_inserted/Innodb_rows_updated/Innodb_rows_deleted: Query the number of rows queried or affected by the corresponding operation
  • Com_select/Com_insert/Com_update/Com_delete: Query the number of corresponding operations
  • last_query_cost: Stats SQL query cost and returns the number of pages that need to be read for the previous query
    • After executing the SQL statement, get the cost of the current query by looking at the value of the last_query_cost variable in the current session. It is usually also a common indicator for us to evaluate the execution efficiency of a query. This query cost corresponds to the number of pages that the SQL statement needs to read
    • Usage scenarios: It is very useful for comparing costs, especially when we have several query methods to choose from.

SQL query is a dynamic process. From the perspective of page loading, we can draw the following two conclusions:

  1. Location determines efficiency. If the page exists in the memory, it will be much more efficient than reading it from the disk.
  2. Batch determines efficiency. Using sequential reading method to read pages in batches, the average reading efficiency of one page will be greatly improved.

So we must first consider the location of data storage. If it is frequently used data, we should try to put it in the buffer pool. Secondly, we can make full use of the throughput capacity of the disk and read the data in batches at one time, so that The reading efficiency of a single page is also improved.

3. Locating SQL that is fully executed: slow query log

The slow query log will record statements whose response time exceeds the threshold in MySQL. Specifically, it refers to SQL whose running time exceeds the long_query_time value, which will be recorded in the slow query log. Default value 10 (seconds)

By default, the MySQL database does not enable the slow query log. If it is not required for tuning, it is generally not recommended to enable this parameter (it will have a certain performance impact)

show [global|session] variables like 'parameters'

Permanent setting method: modify the my.cnf file, add or modify the parameters long_query_time, slow_query_log, slow_query_log_file under [mysqld], and then restart MySQL

[mysqld]
slow_query_log=ON #Slow query log switch
slow_query_log_file=/var/lib/mysql/slow_query_log.log # Slow query log location
long_query_time=3 # Slow query threshold, unit seconds
log_output=FILE # Log storage method, log_output='TABLE' means to store the log in the database, so that the log information will be written to the mysql.slow_log table

If you do not specify a storage path, it will be stored in the MySQL data folder by default; if you do not specify a file name, it will default to hostname-slow.log.

Slow log analysis tool: mysqldumpslow, you can use this tool to view slow SQL logs by query time, average query time, etc.

mysqldumpslow --help

Delete the slow query log: Just delete it manually, and then execute the following command to regenerate the query log file.

mysqladmin -uroot -p flush-logs slow

4. View SQL execution costs: show profile

Show Profile is a tool provided by MySQL that can be used to analyze what SQL has done in the current session and the resource consumption of execution. It can be used for SQL tuning measurements. Off by default and saves the results of the last 15 runs.

When using it, it is recommended to log in to mysql using the command window and then execute the corresponding command to check the situation. Direct execution with interface tools will cause a lot of interference.

show variables like 'profiling'; -- View switches
set profiling = 'ON'; -- Set switch, ON|OFF
show profiles; -- View the last 15 running times
show profile; -- View the execution time of each stage of the last SQL statement. You can also view the information_schema.profiling table to view the execution status of the above 15 times.
show profile for query 82; -- View the query time with query_id 82
show profile cpu,block io for query 1; -- View the cpu and block io conditions at the same time

executing: SQL execution time. If this time is too long, consider using the explain tool to analyze SQL for SQL optimization. Otherwise, look at other items and consider Database parameter configuration optimization

Common parameters for show profile:

  1. all: Display all overhead information
  2. block io: displays block IO overhead
  3. context switches: context switching overhead
  4. cpu: Displays CPU overhead information
  5. ipc: Displays sending and receiving overhead information
  6. memory: Display memory overhead information
  7. page faults: Display page fault overhead information
  8. source: Displays overhead information related to Source_function, Source_file, Source_line
  9. swaps: Displays swap count overhead information

Things to note in daily development:

  1. converting HEAP to MyISAM: The query results are too large, the memory is not enough, and the data is moved to the disk.
  2. Creating tmp table: Create a temporary table. Copy the data to the temporary table first, then delete the temporary table after use
  3. Copying to tmp table on disk: Copying the temporary table in memory to disk, be careful!
  4. locked

If any of the above four results appear in the show profile diagnostic results, the SQL statement needs to be optimized.

Note: The show profile command will be deprecated. We can view it from the information_schema.profiling data table (session level). This table will only save the execution status of the last 15 SQLs.

5. Analysis query tool: explain

MySQL provides the explain statement to view the specific execution plan of the SQL statement. Note: Some statistical information is estimated and not precise.

explain select 1;
Column name Description
id The unique id corresponding to each select
select_type The type of query corresponding to the select keyword
table Table name, each table corresponds to a piece of data, there may be temporary tables
partitions Matching partition information
type Access method for single table
possible_keys Possibly used indexes
key actually used Index
key_len Unit byte, the actual index length used, mainly for joint indexes Reference meaning
ref When using the index column equal value query, the object information that matches the index column for equal value
rows The estimated number of records to be read, the smaller the value, the better
filtered The percentage of the remaining records of a table after filtering by search conditions. This value of the execution plan record of the driving table in the connection query determines the number of executions of the driven table ( rows * filtered)
Extra Describe some additional information in the SQL execution plan, and more accurately describe how MySQL is Execute the given statement, mainly to avoid: Using filesort and Using temporary

5.1 id

The unique id corresponding to select. When there are multiple tables, there will be multiple records, such as:

select * from s1 join s2; -- There will be two rows of data according to the table situation, but there is only one select statement, so their ids are all 1

If there are multiple selects written in the SQL statement, but the ID numbers cannot match, it means that the optimizer has rewritten the SQL

  • If the IDs are the same, they can be considered as a group and executed sequentially from top to bottom.
  • In all groups, the larger the id value is, the higher the priority is and the execution is performed first.
  • Points of concern: Each ID represents an independent query. The fewer the number of queries for a SQL statement, the better

5.2 type

A record in the execution plan represents the access method used by MySQL when executing a query on a certain table, also known as the “access type”. The type column indicates what the access method is, which is more An important indicator. For example, if you see that the value of the type column is ref, it indicates that MySQL will use the ref access method to execute the query on the s1 table.

Complete access method: system,const,eq_ref,ref,fulltext, ref_or_null,index_merge,index_subquery,range,index,ALL.

system: When there is only one record in the table, and the statistics of the storage engine used by the table are accurate, such as MyISAM and Memory

select count(*) from f1; -- f1 uses the MyISAM storage engine, and the number of records will be recorded in the table, so it is system

const: When the primary key or unique secondary index column is equivalent to a constant, the access method to a single table is const

select * from f1 where a = 1; -- a is the primary key or unique index

eq_ref: During connection query, if the driven table is accessed through value matching of primary key or unique secondary index column (if the primary key or unique secondary index is a joint index) , all index columns must be compared for equality), then the access method to the driven table is eq_ref

select * from f1 join f2 on f1.id = f2.id; -- f1 needs to check the entire table, and for f2, every piece of data in the driving table f1 can be driven by the primary key or unique in the driven table f2 Index finds corresponding data

ref: When querying a table through equality matching between ordinary secondary index columns and constants, the access method to the table may be ref

select * from f1 where b = 'abc'; -- b is a normal index

ref_or_null: When an equal value matching query is performed on a common secondary index, and the value of the index column can also be a NULL value, the access method to the table may be ref_or_null.

select * from f1 where b = 'abc' or b is null; -- increase the possibility of null in the case of ref

index_merge: When accessing a single table, Union, Intersection, and Sort-Union can be used in certain scenarios. These three index merge methods are used to execute queries.

select * from f1 where a = 'a' or b = 'b'; -- both a and b are indexes, then both a and b indexes will be used

unique_subquery: In some query statements containing IN subquery, if the query optimizer decides to convert the IN subquery into If exists subquery, and the subquery can use the primary key for equivalent matching, then the type column of the subquery execution plan is unique_subquery

select * from f1 where b in (select id from f2 where f1.a = f2.a) or c = 'abc'; -- id is the primary key of f2, similar to eq_ref

range: Use the index to obtain records of certain range intervals, then you may use the range access method

select * from f1 where b in ('a', 'b', 'c');
select * from f1 where b > 'a' and b < 'c';

index: When index coverage can be used, but all index records need to be scanned, the access method of the table may be index

select b from f1 where c = 'x'; -- There are joint indexes a, b, c on f1, so there is no need to return the table when querying

In summary, the resulting values from best to worst are:

system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL

The goal of SQL performance optimization: At least the range level must be reached, the requirement is the ref level, preferably the const level (Alibaba Development Manual)

6. explain further use

6.1 explain supports 4 output formats

Traditional, json, tree and visual interfaces, among which Json has more printing of cost information than the traditional mode. The visual interface requires the use of MySQL Workbench to view the SQL execution plan

explain format=json select * from f1;

6.2 show warnings

After using explain, you can then use the show warnings statement to view extended information related to the query plan of the corresponding query, which are Level, Code, and Message;

If the code value is 1003, it means that SQL has been rewritten. Through the message of show warnings, you can view the optimized SQL execution statement

7. Analyze the optimizer execution plan: trace

OPTIMIZER_TRACE is a tracking function introduced in MySQL 5.6. It can track various decisions made by the optimizer (such as methods of accessing tables, various cost calculations, various conversions, etc.) and will track the results. Logging into the INFORMATION_SCHEMA.OPTIMIZER_TRACE table This feature is turned off by default. Enable trace, set the format to JSON, and set the maximum memory size that trace can use to avoid complete display due to the default memory being too small during the parsing process. This is just a demonstration, not a detailed analysis.

set optimizer_trace="enabled=on",end_markers_in_json=on;
set optimizer_trace_max_mem_size=1000000;
mysql> select * from INFORMATION_SCHEMA.OPTIMIZER_TRACE\G;
*************************** 1. row ********************* *******
/* Part 1: SQL statements */
                            QUERY: select * from niuke.user_profile
/* Part 2: Tracking information for statements corresponding to the QUERY field */
                            TRACE: {
  "steps": [
    {
      /* Preparatory work */
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `niuke`.`user_profile`.`id` AS `id`,`niuke`.`user_profile`.`device_id` AS `device_id`,`niuke`.` user_profile`.`gender` AS `gender`,`niuke`.`user_profile`.`age` AS `age`,`niuke`.`user_profile`.`university` AS `university`,`niuke`.`user_profile` .`gpa` AS `gpa`,`niuke`.`user_profile`.`active_days_within_30` AS `active_days_within_30`,`niuke`.`user_profile`.`question_cnt` AS `question_cnt`,`niuke`.`user_profile`.` answer_cnt` AS `answer_cnt` from `niuke`.`user_profile`"
          }
        ] /* steps */
      } /* join_preparation */
    },
    {
      /* optimize */
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "table_dependencies": [
              {
                "table": "`niuke`.`user_profile`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ] /* depends_on_map_bits */
              }
            ] /* table_dependencies */
          },
          {
            "rows_estimation": [
              {
                "table": "`niuke`.`user_profile`",
                "table_scan": {
                  "rows": 7,
                  "cost": 0.25
                } /* table_scan */
              }
            ] /* rows_estimation */
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`niuke`.`user_profile`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 7,
                      "access_type": "scan",
                      "resulting_rows": 7,
                      "cost": 0.95,
                      "chosen": true
                    }
                  ] /* considered_access_paths */
                } /* best_access_path */,
                "condition_filtering_pct": 100,
                "rows_for_plan": 7,
                "cost_for_plan": 0.95,
                "chosen": true
              }
            ] /* considered_execution_plans */
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": null,
              "attached_conditions_computation": [
              ] /* attached_conditions_computation */,
              "attached_conditions_summary": [
                {
                  "table": "`niuke`.`user_profile`",
                  "attached": null
                }
              ] /* attached_conditions_summary */
            } /* attaching_conditions_to_tables */
          },
          {
            "finalizing_table_conditions": [
            ] /* finalizing_table_conditions */
          },
          {
            "refine_plan": [
              {
                "table": "`niuke`.`user_profile`"
              }
            ] /* refine_plan */
          }
        ] /* steps */
      } /* join_optimization */
    },
    {
      "join_execution": {
        "select#": 1,
        "steps": [
        ] /* steps */
      } /* join_execution */
    }
  ] /* steps */
}
/* Part 3: The tracking information is too long and the truncated part */
MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0
/* Part 4: Whether the user executing the trace statement has permission to view the object. When there is no defect, the column value is 1 and the trace field is empty. This situation usually occurs when a stored procedure is called */
          INSUFFICIENT_PRIVILEGES: 0
1 row in set (0.00 sec)

8. MySQL monitoring and analysis view: sys_schema

Regarding MySQL performance monitoring and problem diagnosis, we generally obtain the desired data from performance_schema. In the MySQL5.7.7 version, sys_schema is added, which will performance_schema and information_schema are summarized into “views” in a more understandable way. The purpose is to reduce the complexity of querying performance_schema and allow DBA to quickly locate problems. What monitoring tables and views are there? Mastering these will help us get twice the result with half the effort in our development and operation and maintenance processes.

8.1 sys schema view summary

  1. Host related: Starting with host_summary, it mainly summarizes IO delay information
  2. Innodb related: Starting with innodb, it summarizes innodb buffer information and transaction waiting for innodb lock information
  3. I/O related: Starting with io, it summarizes waiting for I/O and I/O usage.
  4. Memory usage: Starting with memory, it shows the memory usage from the perspective of host, thread, event, etc.
  5. Connection and session information: processlist and session related views, summarizing session related information
  6. Table related: The view starting with schema_table shows the statistical information of the table
  7. Index information: Statistics of index usage, including redundant indexes and unused indexes
  8. Statement related: starts with statement and contains statement information for performing full table scan, using temporary tables, sorting, etc.
  9. User related: The view starting with user counts the file I/O and execution statement statistics used by the user.
  10. Waiting event related information: starting with wait, showing the delay of waiting events

8.2 sys schema usage scenarios

Index situation
# 1. Query redundant index
select * from sys.schema_redundant_indexes;
# 2. Query unused indexes
select * from sys.schema_unused_indexes;
# 3. Query index usage
select * from sys.schema_index_statistics where table_schema = 'dbname';
Table related
# 1. Query the number of visits to the table
select table_schema, table_name, sum(io_read_requests + io_write_requests) as io from sys.schema_table_statistics group by table_schema, table_name order by io desc;
# 2. Query tables that occupy more buffer pools
select object_schema,object_name,allocated,data from sys.innodb_buffer_stats_by_table order by allocated limit 10;
# 3. Full table scan of query table
select * from sys.statements_with_full_table_scans where db='dbname';
Statement related
# 1. Monitor the frequency of SQL execution
select db,exec_count,query from sys.statement_analysis order by exec_count desc;
# 2. Monitor SQL that uses sorting
select db,exec_count,first_seen,last_seen,query from sys.statements_with_sorting limit 1;
# 3. Monitor SQL that uses temporary tables or disk temporary tables
select db,exec_count,tmp_tables,tmp_disk_tables,query from sys.statement_analysis where tmp_tables > 0 or tmp_disk_tables > 0 order by (tmp_tables + tmp_disk_tables) desc;
IO related
# 1. View disk consumption IO files
select file,avg_read,avg_write,avg_read + avg_write as avg_io from sys.io_global_by_file_by_bytes order by avg_read limit 10;
Innodb related
# 1. Row lock blocking situation
select * from sys.innodb_lock_waits;

Risk warning: When querying through the sys library, MySQL will consume a lot of resources to collect relevant information. In severe cases, business requests may be blocked, causing failures. It is recommended that in production, not frequently query sys or performance_schema, information_schema to complete monitoring, inspection, etc.