SQL logic and optimization (must know for interviews)

What we often need to operate in development is SQL statements. SQL statements greatly determine this function. In complex businesses, there are nothing more than complex SQL statements. Like MQ and Redis, they are actually auxiliary SQL functions. So how to parse and optimize SQL? , today I will use a simple example to help understand the logic and optimization of SQL

1.SQL analysis

Articles to help understand SQL functions Dabaimao~’s blog SQL optimization plan Dabaimao~’s blog

The essence of sql is actually four statements and n functions. Next, a very simple table is used to achieve parsing and optimization. The sql is as follows

CREATE TABLE students (
  student_id INT PRIMARY KEY,
  name VARCHAR(100),
  score DECIMAL(5, 2),
  class VARCHAR(50),
  department_name VARCHAR(100)
);

Insert data

INSERT INTO students (student_id, name, score, class, department_name) VALUES
(1, 'Little sister', 95.50, 'Class 1', 'Science Department'),
(2, 'Xiao Li', 88.75, 'Class 2', 'Mathematics Department'),
(3, 'Yao Yao', 78.25, 'Class 1', 'Art Department'),
(4, 'Kunkun', 92.00, 'Class 3', 'Science Department'),
(5, 'Xiaoli', 85.00, 'Class 2', 'Mathematics Department'),
(6, 'Zhang Ming', 76.50, 'Class 3', 'Art Department'),
(7, 'Bai Su', 89.25, 'First Class', 'Science Department');

This is a sql that creates a table and inserts seven pieces of data. The details are as follows

student_id is an integer field, used as the primary key to uniquely identify each student
name student name is a variable length string field
class student class is a variable length string field
department_name student department name is a variable length string field
score The student score is a decimal field that can store up to a total of 5 digits, including 2 after the decimal point

Implement add, delete, modify and check functions

Insert – Insert a new record into the students table

INSERT INTO students (student_id, name, score, class, department_name)
VALUES (1, 'Zhang San', 88.5, 'Class 1', 'Computer Science and Technology');

Delete – Delete a user from the students table, assuming we know that the user’s UserID is 1

DELETE FROM students WHERE student_id = 1;

Update – Update a student’s score. For example, we want to change the score of the user whose UserID is 1 to 90.0 points

UPDATE students SET score = 90.0 WHERE student_id = 1;

Select – Query user information from the students table; here are a few query examples

Query all records from the students table:
SELECT * FROM students;

Query information about all students in a specific department:
SELECT * FROM students WHERE department_name = 'Computer Science and Technology';

2.The role of exists and in

If it is a single-table SQL query, it is not difficult, but in business, we generally need to attribute split a bunch of data into various tables, and the tables contain corresponding field attributes < /strong>, then we need to perform a join table query on it

We all know that multi-table association itself is a conditional query after nested for. Therefore, if the table that needs to be associated is used to join the association, it will greatly affect the efficiency of the query and make the Cartesian product become very large. So what? How to solve this problem

It explains the two functions exists and in. Why? It should be because they can avoid excessive inline queries, reduce Cartesian products, and speed up query efficiency. The general explanation is as follows:

exists (the outermost sql inside can nest the main sql, execute the sql first and then nest the main sql. The larger the sql joint table, the better) in (the outermost sql inside can nest the main sql, execute the sql first and then nest the main sql. The smaller the sql joint table, the better)

Among them we need to know two names, parent query and subquery

The subquery does not depend on the parent query. The parent query depends on the results of the subquery.

(IN) Example 1: Help me find the names of students with the same scores as Yaoyao (excluding Yaoyao herself)

SELECT name FROM students WHERE score IN (SELECT score FROM students WHERE name = 'Yaoyao') AND name <> 'Yao Yao';

There are also the following

Query the names of students with higher scores than Yaoyao SELECT name FROM students WHERE score > (SELECT score FROM students WHERE name = 'Yaoyao'); Query the names of students with lower scores than Yaoyao SELECT name FROM students WHERE score < (SELECT score FROM students WHERE name = 'Yaoyao');

(IN) Example 2: Find the names of classmates who have the same score as Yaoyao and are in the same class as Yaoyao (excluding Yaoyao herself)

SELECT name FROM students WHERE score IN ( SELECT score FROM students WHERE name = 'Yao Yao' ) AND class IN ( SELECT class FROM students WHERE name = 'Yao Yao' ) AND name <> 'Yao Yao';

(IN) Example 3: Create a table that determines the region based on student ID (excluding Yaoyao herself)

--Create a table containing region information CREATE TABLE student_regions ( student_id int(8), region VARCHAR(255) );

The SQL to implement the logic is as follows

select region from student_regions where student_id IN (SELECT id FROM students WHERE score IN ( SELECT score FROM students WHERE name = 'Yao Yao' ) AND name <> 'Yao Yao'; )

The logic of IN is: First execute the parent query, and then loop through the subquery. That is, as long as the data in the subquery is small and the amount of data in the parent query is large, you can use in to improve the efficiency of sql query

(EXISTS) Example 4: Help me query the names of students who have the same and different scores as Yaoyao

-- Query the names of students with the same scores as Yaoyao SELECT name FROM students WHERE EXISTS (SELECT 1 FROM students s WHERE s.name = 'Yaoyao' AND s.score = students.score); -- Query the names of students whose scores are different from Yaoyao's SELECT name FROM students WHERE NOT EXISTS (SELECT 1 FROM students s WHERE s.name = 'Yaoyao' AND s.score = students.score); -- Query the names of students whose scores are different from Yaoyao's SELECT name FROM students WHERE EXISTS (SELECT 1 FROM students s WHERE s.name = 'Yaoyao' AND s.score != students.score);

(EXISTS) Example 5: Find the names of classmates who have the same score as Yaoyao and are in the same class as Yaoyao (excluding Yaoyao herself)

SELECT name FROM students s1 WHERE EXISTS ( SELECT 1 FROM students s2 WHERE s2.name = 'Yao Yao' AND s2.score = s1.score AND s2.class = s1.class ) AND s1.name <> 'Yao Yao';

(EXISTS) Example 6: Create a table that determines the region based on student ID (excluding Yaoyao herself)

SELECT sr.region FROM student_regions sr WHERE EXISTS ( SELECT 1 FROM students s WHERE s.name <> 'Yao Yao' AND s.score = ( SELECT score FROM students WHERE name = 'Yao Yao' ) AND s.student_id = sr.student_id );

The logic of EXISTS is: First execute the subquery, and then loop through the parent query. That is, as long as the data in the parent query is small and the amount of data in the subquery is large, you can use EXISTS to improve the efficiency of SQL queries

So how do we use these two functions to improve efficiency during development?

1. First connect these tables with where and add all the conditions (confirm whether the data is correct) 2. Find the corresponding table and associate it with the data that needs to be returned by the query. Then it can be implemented in exists or in. Note that the only ones must be associated. 3. Implement nested subqueries of exists and in, understand the logic and improve efficiency (improve efficiency)

Of course, you can also use chatGPT to optimize it yourself.

3. Introduction and use of indexes (including precautions)

This is only the first step of optimization. Next, you need to use indexes to optimize queries.

Indexes will greatly improve the performance of queries, and may also cause a slight decrease in the performance of insert, update and delete operations, because the database management system needs to maintain the structure of the index

Type of index (of course the primary key includes the primary key index, a table can only have one, representing the unique identifier of the data):

NORMAL INDEX: Features: This is the most basic index type, used to speed up the execution of SELECT statements; they have no uniqueness restrictions and allow duplicate values Purpose: Improve the speed of retrieving data, often used for columns that often need to be searched UNIQUE INDEX: Features: Ensure the uniqueness of index columns and do not allow duplicate values; if you try to insert data with duplicate index values, an error will occur Purpose: Used to ensure the uniqueness of a column in a table, usually used for primary key columns COMPOSITE INDEX: Features: Indexes containing multiple columns allow fast retrieval on combinations of these columns; the order in which indexes are combined is important because it affects the performance of queries. Purpose: Accelerate queries involving multiple columns, such as filtering using multiple columns in the WHERE clause Full text index (FULLTEXT INDEX): Features: Used to perform full-text searches on text data, rather than just simple matches; database engines that support full-text searches can better handle natural language queries Purpose: Suitable for text fields that require full-text search, such as article content or blog comments SPATIAL INDEX: Features: Used to optimize retrieval of spatial data types, such as geographic coordinate data in geographic information systems (GIS) Purpose: Suitable for applications that need to process geospatial data, such as map applications or location-based services

The visualization is as follows, first find the corresponding table and click Settings

Click Index to add an index as follows

It contains five attributes such as name, field, index type, index method, and comments.

Introduction as follows

Name: In the Navigator pane, find the table you want to add an index to Right click on the table and select the "Alter Table..." option In the "Indexes" tab, find the "Name" field and enter the name of the index Fields: In the "Fields" tab, you can select the fields you want to add to the index; click the field you want to add, and then click the "Add" button on the right Index Type: In the "Indexes" tab, select the appropriate index type; typically you can choose a normal index, a unique index, or a full-text index Index Method: MySQL Workbench usually selects the appropriate index method by default; if you need a custom index method, you can adjust it in "Index Method" Comment: In the "Indexes" tab you can add comments to the index; this is useful for describing the purpose or functionality of the index

There are two indexing methods: Btree and Hash. One uses the Btree structure to implement the index, and the other uses the Hash structure to implement the index. Most of these two use Btree. Hash only has advantages in equivalent queries.

You can add many index fields to achieve combined indexing, but you need to follow the leftmost matching principle

You only need to add indexes to frequently queried fields. Creating and deleting indexes can be achieved in a few clicks, and no commands are needed.

The usage of index is as follows

Under what circumstances do you need to add an index? 1.Filter conditions in the WHERE clause: You need to create an index on a field that is frequently used in the WHERE clause to speed up the retrieval of data; for example, if you often filter data based on a student's department_name, you can create an index on that field 2. Connection fields in JOIN operation: When performing a join operation (such as JOIN), you need to create an index on the join field to improve the performance of the join; the join field is usually a foreign key 3. Sorting fields in the ORDER BY clause: If a query needs to be sorted by a certain field, you can create an index on the sort field; this can speed up the sort operation, especially for large data sets 4.Group fields in the GROUP BY clause: If you perform an aggregate query, for example using a GROUP BY clause, you can create an index on the grouping field to speed up the grouping operation 5. Unique constraint field: Primary key fields and unique constraint fields should automatically create unique indexes to ensure data uniqueness 6. Foreign key fields: Foreign key fields are usually used to connect relationships between different tables. Indexes should be created on foreign key fields to speed up join operations. 7.Full text search field: If you do a full-text search, you can create a full-text index on the full-text search field to improve search performance 8. Larger text fields: Create full-text indexes on larger text fields to support full-text search operations, such as on blog posts or comment fields Under what circumstances do not add indexes? 1. Columns with low selectivity: If you create an index on a column with low selectivity, it may cause the index to become invalid. For example, it usually doesn't make much sense to create an index on the gender column since it only has two distinct values 2. Short text column: Creating indexes on non-selective short text columns generally does not improve query performance because they do not significantly reduce the amount of data retrieved 3. Frequently updated columns: If a column is updated frequently, the cost of maintaining the index may exceed the performance benefits, causing the index to become invalid. 4. Field order of composite index: In a composite index, the order of the fields is important; if the query does not retrieve the fields in the index, the index may not be used. 5. The table is too small: For very small tables, it may not be worthwhile to create an index because the database system may choose a full table scan instead of using the index

What behaviors may cause index failure?

Fuzzy query (LIKE query not starting with wildcard character): When using the LIKE operator for fuzzy queries, the index will usually fail if the wildcard character (%) appears at the beginning of the search pattern. For example, WHERE name LIKE '%John%' does not use the index. But if the wildcard appears at the end of the pattern (for example, WHERE name LIKE 'John%'), the index may still work. Function operations on columns: If a function operation is applied to an indexed column, such as UPPER(name) = 'JOHN', the index may become invalid because the index cannot be applied directly to the function-processed column. Data type mismatch: Indexes can become invalid when the values in the query do not match the data type of the indexed columns. For example, if the indexed column is an integer but the query uses strings for comparison, the index may not take effect. OR condition: When using multiple OR conditions in a query, the index may become invalid. For example, WHERE age = 25 OR name = 'John' may not fully utilize the index. Inequality conditions: Inequality conditions (such as <, >, <=, >=) may also cause index failure in some cases, especially in the case of non-equivalence conditions. The order of multi-column indexes: For multi-column indexes, the column order of the index is important. If the query conditions are not filtered in the order of the index columns, the index may not be fully utilized. Data distribution is uneven: When data is unevenly distributed, indexes may become invalid. If a value appears too frequently or too rarely in the data, the database may choose not to use an index. Lots of NULL values: If an indexed column contains a large number of NULL values, the index may become invalid because the database may not include NULL values in the index. Over-indexing: Excessively adding indexes may cause performance degradation because the database system needs to maintain multiple indexes, which increases the overhead of write operations. A large number of insert, update or delete operations in a short period of time: Large-scale data change operations may cause index failure because the database system needs to continuously update the index structure.

In this case, the addition of the index is finished. The essence is to build an index number based on the Btree number and Hash structure for querying to improve performance.

4. Sub-database and sub-table

Alibaba stipulates that when the data in a single table reaches 5 million or the capacity of a single table reaches 2GB, it needs to be divided into databases and tables

How to implement the insertion and query of sub-database and sub-table as well as the sub-table of global ID?

The global ID is nothing more than inserting data into hash tables, inserting data into range tables, etc.

1. When inserting data, add it to mybatis dynamic sql to determine whether to insert into different tables. When querying, you need to pass in the sub-table fields for dynamic sql splicing. 2. Stop the server and directly change the table manually. When querying, you need to manually change the SQL table. 3. Use the sub-database and sub-table middleware (no details on this yet)

So if you want to optimize SQL, first optimize the SQL statement. If it doesn’t work, consider querying commonly used fields and adding indexes. Finally, if the amount of data is too large, use sub-databases and tables to improve efficiency