Scenarios for MySQL to create internal temporary tables

This article from the technical community “Technology Sharing | Inventory of All Scenarios for MySQL to Create Internal Temporary Tables” explains the application scenarios of temporary tables, you can learn about it.

A temporary table is a table that temporarily stores data. This type of table will be automatically cleaned up at the end of the session. However, there are two kinds of temporary tables in MySQL, one is an external temporary table and the other is an internal temporary table.

External temporary tables refer to temporary tables created manually by users using CREATE TEMPORARY TABLE. Internal temporary tables are beyond user control and cannot be created using the CREATE statement like external temporary tables. MySQL’s optimizer will automatically choose whether to use internal temporary tables.

When does MySQL use internal temporary tables?

We will analyze common scenarios such as UNION and GROUP BY.

1UNION scene

First prepare a test sheet.

CREATE TABLE `employees` (
  `id` int NOT NULL AUTO_INCREMENT,
  `first_name` varchar(100) COLLATE utf8mb4_bin DEFAULT NULL,
  `last_name` varchar(100) COLLATE utf8mb4_bin DEFAULT NULL,
  `sex` enum('M','F') COLLATE utf8mb4_bin DEFAULT NULL,
  `age` int DEFAULT NULL,
  `birth_date` date DEFAULT NULL,
  `hire_date` date DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `last_name` (`last_name`),
  KEY `hire_date` (`hire_date`)
) ENGINE=InnoDB AUTO_INCREMENT=500002 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

Prepare script to insert data.

#!/usr/bin/python
#! coding=utf-8
 
import random
importpymysql
from faker import Faker
from datetime import datetime, timedelta
 
#Create Faker instance
fake = Faker()
 
# MySQL connection parameters
db_params = {
    'host': 'localhost',
    'user': 'root',
    'password': 'root',
    'db': 'db1',
    'port': 3311
}
 
# Connect to the database
connection = pymysql.connect(**db_params)
 
#Create a new Cursor instance
cursor = connection.cursor()
 
# Generate and insert data
for i in range(5000):
    id = (i + 1)
    first_name = fake.first_name()
    last_name = fake.last_name()
    sex = random.choice(['M', 'F'])
    age = random.randint(20, 60)
    birth_date = fake.date_between(start_date='-60y', end_date='-20y')
    hire_date = fake.date_between(start_date='-30y', end_date='today')
 
    query = f"""INSERT INTO employees (id, first_name, last_name, sex, age, birth_date, hire_date)
                VALUES ('{id}', '{first_name}', '{last_name}', '{sex}', {age}, '{birth_date}', '{hire_date}');"""
 
    cursor.execute(query)
 
# Submit a transaction every 1000
    if (i + 1) % 1000 == 0:
        connection.commit()
 
#Finally commit the transaction
connection.commit()
 
# Close the connection
cursor.close()
connection.close()

After creating the test data, execute a statement with UNION.

root@localhost:mysqld.sock[db1]> explain (select 5000 as res from dual) union (select id from employees order by id desc limit 2);
 + ---- + -------------- + ------------ + ------------ + --- ---- + --------------- + --------- + --------- + ------ + -- ---- + ---------- + ---------------------------------- +
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
 + ---- + -------------- + ------------ + ------------ + --- ---- + --------------- + --------- + --------- + ------ + -- ---- + ---------- + ---------------------------------- +
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 2 | UNION | employees | NULL | index | NULL | PRIMARY | 4 | NULL | 2 | 100.00 | Backward index scan; Using index |
| NULL | UNION RESULT | <union1,2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
 + ---- + -------------- + ------------ + ------------ + --- ---- + --------------- + --------- + --------- + ------ + -- ---- + ---------- + ---------------------------------- +
3 rows in set, 1 warning (0.00 sec)

It can be seen that the key value in the second row is PRIMARY, that is, the second query uses the primary key ID. The extra value in the third line is Using temporary, which indicates that a temporary table was used when UNIONing the result sets of the above two queries.

The UNION operation is the union of two result sets, excluding duplicates. To do this, you only need to first create an in-memory temporary table with only the primary key, and insert the value of the first subquery into this table, thus avoiding duplication problems. Because the value 5000 already exists in the temporary table, and the value 5000 of the second subquery cannot be inserted due to conflict, only the next value 4999 can be inserted.

UNION ALL is different from UNION in that it does not use memory temporary tables. The following example is an execution plan using UNION ALL.

root@localhost:mysqld.sock[db1]> explain (select 5000 as res from dual) union all (select id from employees order by id desc limit 2);
 + ---- + ------------- + ----------- + ------------ + ----- -- + --------------- + --------- + --------- + ------ + ---- -- + ---------- + ---------------------------------- +
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
 + ---- + ------------- + ----------- + ------------ + ----- -- + --------------- + --------- + --------- + ------ + ---- -- + ---------- + ---------------------------------- +
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 2 | UNION | employees | NULL | index | NULL | PRIMARY | 4 | NULL | 2 | 100.00 | Backward index scan; Using index |
 + ---- + ------------- + ----------- + ------------ + ----- -- + --------------- + --------- + --------- + ------ + ---- -- + ---------- + ---------------------------------- +
2 rows in set, 1 warning (0.01 sec)

Because UNION ALL does not require deduplication, the optimizer does not need to create a new temporary table to perform deduplication. During execution, it only needs to execute two subqueries in sequence and put the subqueries in a result set.

It can be seen that in terms of implementing the semantics of UNION, the temporary table plays the role of temporarily storing data and performing deduplication actions.

2GROUP BY

In addition to UNION, there is also a more commonly used clause GROUP BY that also uses internal temporary tables. The following example shows a method that uses the ID column to find the remainder and perform group statistics, sorting according to the size of the remainder.

root@localhost:mysqld.sock[db1]> explain select id%5 as complementation,count(*) from employees group by complementation order by 1;
 + ---- + ------------- + ----------- + ------------ + ----- -- + -------------------------- + ----------- + ----- ---- + ------ + ------ + ---------- + -------------- -------------------------- +
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
 + ---- + ------------- + ----------- + ------------ + ----- -- + -------------------------- + ----------- + ----- ---- + ------ + ------ + ---------- + -------------- -------------------------- +
| 1 | SIMPLE | employees | NULL | index | PRIMARY,last_name,hire_date | hire_date | 4 | NULL | 5000 | 100.00 | Using index; Using temporary; Using filesort |
 + ---- + ------------- + ----------- + ------------ + ----- -- + -------------------------- + ----------- + ----- ---- + ------ + ------ + ---------- + -------------- -------------------------- +
1 row in set, 1 warning (0.00 sec)

You can see that the values of extra are using index, using temporary, and using filesort; these three values are: using index, using temporary table, and using sorting.

Note: In MySQL version 5.7, GROUP BY sorts by grouping fields by default. In MySQL version 8.0, the default sorting function is canceled, so ORDER BY is used here to reproduce.

For GROUP BY, after the above statement is executed, an internal temporary table in memory will be created to store the values of complementation and count(*). The primary key is complementation. Then calculate the value of id%5 sequentially according to the ID value corresponding to the index hire_date and record it as x. If there is no value with the primary key of x in the temporary table , then the record will be inserted into the temporary table; if it exists, the count of this row will be accumulated count(*). After traversing the above operations, sort complementation according to the rules of ORDER BY.

When using GROUP BY for grouping or DISTINCT for deduplication, MySQL provides us with a way to use hints to avoid using internal temporary tables in memory.

hint Explanation
SQL_BIG_RESULT Explicitly specifies that the SQL statement uses the internal disk temporary table, which is suitable for operations with large data volumes; suitable for InnoDB engine and Memory engine.
SQL_SMALL_RESULT Explicitly specify that the SQL statement uses an internal temporary table in memory, which is faster and suitable for operations with small data volumes; suitable for Memory engines.

The following is an example using SQL_BIG_RESULT.

root@localhost:mysqld.sock[db1]> explain select SQL_BIG_RESULT id%5 as complementation,count(*) from employees group by complementation order by 1;
 + ---- + ------------- + ----------- + ------------ + ----- -- + -------------------------- + ----------- + ----- ---- + ------ + ------ + ---------- + -------------- --------- +
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
 + ---- + ------------- + ----------- + ------------ + ----- -- + -------------------------- + ----------- + ----- ---- + ------ + ------ + ---------- + -------------- --------- +
| 1 | SIMPLE | employees | NULL | index | PRIMARY,last_name,hire_date | hire_date | 4 | NULL | 5000 | 100.00 | Using index; Using filesort |
 + ---- + ------------- + ----------- + ------------ + ----- -- + -------------------------- + ----------- + ----- ---- + ------ + ------ + ---------- + -------------- --------- +
1 row in set, 1 warning (0.00 sec)

From the execution plan, we can see that after using the SQL_BIG_RESULT hint for query, the words Using Temporary in the extra column have disappeared, which means that the use of internal temporary tables in memory is avoided.

3Other scenes

Of course, in addition to the above two examples, MySQL will also create internal temporary tables in the following situations:

  1. For evaluation of UNION statements, with some exceptions described subsequently.

  2. For the evaluation of certain views, such as those using the TEMPTABLE algorithm, UNION, or aggregation.

  3. Evaluation of derived tables.

  4. Evaluation of public expressions.

  5. Table used for subquery or semi-join materialization.

  6. Evaluation of a statement that contains an ORDER BY clause and a different GROUP BY clause, or a statement in which the ORDER BY or GROUP BY clause contains columns from a table other than the first table in the join queue.

  7. For the combination of DISTINCT and ORDER BY, a temporary table may be required.

  8. For queries that use the SQL_SMALL_RESULT modifier, MySQL uses an in-memory temporary table, unless the query also contains elements that need to be stored on disk.

  9. To evaluate an INSERT … SELECT statement that selects and inserts into the same table, MySQL creates an internal temporary table to hold the SELECT rows and then inserts those rows into the target table.

  10. Evaluation of multi-table UPDATE statements.

  11. For evaluation of GROUP_CONCAT() or COUNT(DISTINCT) expressions.

  12. Evaluation of window functions, using temporary tables as needed.

It is worth noting that MySQL does not allow the use of in-memory temporary tables for certain query conditions, in which case the server uses on-disk internal temporary tables.

  1. A BLOB or TEXT column exists in the table. TempTable, the default storage engine for in-memory temporary tables in MySQL 8.0, supports the binary large object type starting in 8.0.13.

  2. If UNION or UNION ALL are used, there are any string columns in the SELECT list whose maximum length exceeds 512 (bytes for binary strings, characters for non-binary strings).

  3. SHOW COLUMNS and DESCRIBE statements use BLOB as the type for some columns, so the temporary table used for this result will be an on-disk temporary table.

If you think this article is helpful, please feel free to click “Like” and “Reading” at the end of the article, or forward it directly to pyq,

97d30b391cbc69f657b03fcdb1b2827f.png

Recently updated articles:

“Learning and Understanding of my country’s Legal Ranks”

“Popular Financial Knowledge – Multi-level Capital Market”

“How to locate the culprit of locking users?” 》

“Database Dry Information Hardcore Public Account”

“Talk about some “what ifs” for Guoan this season”

Recent hot articles:

“Recommend a classic paper on Oracle RAC Cache Fusion”

“The shock that the open source code of the “Red Alert” game brings to us”

Article classification and indexing:

“Classification and Indexing of 1,300 Public Account Articles”

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. MySQL entry skill treeUsing databaseCreating and deleting tables 78094 people are learning the system