Full disclosure! Mysql operation DQL for the perfect landing of big data from 0 to 1

DQL, short for Data Query Language, is a programming language for retrieving data from a database. DQL is a subset of SQL (Structured Query Language), used to query relational databases, such as MySQL, Oracle, and SQL Server.

DQL provides a variety of query operations, such as SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, etc. Using these operations, the desired data can be retrieved based on specific criteria, sorted and grouped in a specific order.

DQL also supports multi-table queries and subqueries, which can jointly retrieve data from multiple tables, and use nested query statements in subqueries for retrieval.

Basic query syntax

select ... from ...

select [distinct] ... from ... [where ...] [group by ...] [having ...] [order by ...] [limit ...]

Execution order of query statements

  1. Execute the from clause first: query based on the table
  2. Execute the where clause again: perform conditional filtering or conditional filtering
  3. Then execute the group by clause: perform group query on the remaining data.
  4. Execute the having clause again: after grouping, filter or filter again
  5. Then execute the select clause: the purpose is to select the fields required by the business for display
  6. Then execute the order by clause: sort the selected fields
  7. Finally, execute the limit clause: perform pagination query, or query the first n records

Preparing data

Before learning the syntax of the next query, we prepare several tables in advance and insert some data into this table to facilitate our subsequent query operations.

student table

Field Name Field Type Description strong>
sid char(6) Student Student ID
sname varchar(50) Student Name
age int Student age
gender varchar(50) student gender
CREATE TABLE stu (
sid CHAR(6),
sname VARCHAR(50),
age INT,
gender VARCHAR(50)
);
INSERT INTO stu VALUES('S_1001', 'liuYi', 35, 'male');
INSERT INTO stu VALUES('S_1002', 'chenEr', 15, 'female');
INSERT INTO stu VALUES('S_1003', 'zhangSan', 95, 'male');
INSERT INTO stu VALUES('S_1004', 'liSi', 65, 'female');
INSERT INTO stu VALUES('S_1005', 'wangWu', 55, 'male');
INSERT INTO stu VALUES('S_1006', 'zhaoLiu', 75, 'female');
INSERT INTO stu VALUES('S_1007', 'sunQi', 25, 'male');
INSERT INTO stu VALUES('S_1008', 'zhouBa', 45, 'female');
INSERT INTO stu VALUES('S_1009', 'wuJiu', 85, 'male');
INSERT INTO stu VALUES('S_1010', 'zhengShi', 5, 'female');
INSERT INTO stu VALUES('S_1011', 'xxx', NULL, NULL);

emp table

Field Name Field Type Description strong>
empno int employee number
ename varchar(50) employee name
job varchar(50) employee job
mgr int Leader ID
hiredate date Entry date
sal decimal(7,2) monthly salary
comm decimal(7,2) bonus
deptno int section number
CREATE TABLE emp(
empno INT,
ename VARCHAR(50),
job VARCHAR(50),
mgr INT,
hired date DATE,
sal DECIMAL(7,2),
comm decimal(7,2),
deptno INT
);
INSERT INTO emp values(7369,'SMITH','CLERK',7902,'1980-12-17',800,NULL,20);
INSERT INTO emp values(7499,'ALLEN','SALESMAN',7698,'1981-02-20',1600,300,30);
INSERT INTO emp values(7521,'WARD','SALESMAN',7698,'1981-02-22',1250,500,30);
INSERT INTO emp values(7566,'JONES','MANAGER',7839,'1981-04-02',2975,NULL,20);
INSERT INTO emp values(7654,'MARTIN','SALESMAN',7698,'1981-09-28',1250,1400,30);
INSERT INTO emp values(7698,'BLAKE','MANAGER',7839,'1981-05-01',2850,NULL,30);
INSERT INTO emp values(7782,'CLARK','MANAGER',7839,'1981-06-09',2450,NULL,10);
INSERT INTO emp values(7788,'SCOTT','ANALYST',7566,'1987-04-19',3000,NULL,20);
INSERT INTO emp values(7839,'KING','PRESIDENT',NULL,'1981-11-17',5000,NULL,10);
INSERT INTO emp values(7844,'TURNER','SALESMAN',7698,'1981-09-08',1500,0,30);
INSERT INTO emp values(7876,'ADAMS','CLERK',7788,'1987-05-23',1100,NULL,20);
INSERT INTO emp values(7900,'JAMES','CLERK',7698,'1981-12-03',950,NULL,30);
INSERT INTO emp values(7902,'FORD','ANALYST',7566,'1981-12-03',3000,NULL,20);
INSERT INTO emp values(7934,'MILLER','CLERK',7782,'1982-01-23',1300,NULL,10);

dept table

Field Name Field Type Description strong>
deptno int partial encoding
dname varchar(50) partial name
loc varchar(50) partial location
CREATE TABLE dept(
dept no INT,
dname varchar(14),
loc varchar(13)
);
INSERT INTO dept values(10, 'ACCOUNTING', 'NEW YORK');
INSERT INTO dept values(20, 'RESEARCH', 'DALLAS');
INSERT INTO dept values(30, 'SALES', 'CHICAGO');
INSERT INTO dept values(40, 'OPERATIONS', 'BOSTON');

Basic query

  1. query all columns

    SELECT * FROM stu;
    
  2. Query the specified column

    SELECT sid, sname, age FROM stu;
    

Conditional query

Conditional query is to give the WHERE clause when querying. The following operators and keywords can be used in the WHERE clause:

=, !=, <>, <, <=, >, >=, BETWEEN…AND, IN(set), IS NULL, AND, OR, NOT, XOR (exclusive OR)

  1. Query records whose gender is female and whose age is less than 50

    SELECT * FROM stu
    WHERE gender='female' AND age<50;
    
  2. Query the record with the student number S_1001 or the name liSi

    SELECT * FROM stu WHERE sid ='S_1001' OR sname='liSi';
    
  3. Query records with student numbers S_1001, S_1002, and S_1003

    SELECT * FROM stu WHERE sid IN ('S_1001','S_1002','S_1003');
    

Fuzzy query

Query according to fuzzy conditions, you can use LIKE conditions, or REGEXP.

like

like is used after the where clause to indicate a partial match. After like, there are usually two kinds of wildcards:

_ => means match any one character.

% => means match any character.

# Query all students whose names start with s
select * from student where sname like 's%'
# Query all students whose names start with s and whose length is 5
select * from student where sname like 's____'

regexp

Use regular expressions to match strings.

pattern description
^ Matches the beginning of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after ‘\
‘ or ‘\r’.
$ Matches the end of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before ‘\
‘ or ‘\r’.
. Match any single character except “\
“.
[…] A set of characters. Matches any one of the contained characters. For example, ‘[abc]’ would match ‘a’ in “plain”.
[^…] Negative character set. Matches any character not contained. For example, ‘[^abc]’ would match the ‘p’ in “plain”.
\d [0-9], matches all numbers.
p1|p2|p3 Match p1 or p2 or p3. For example, ‘z|food’ would match either “z” or “food”. ‘(z|f)ood’ matches “zood” or “food”.
* Match the preceding subexpression zero or more times. For example, zo* would match “z” as well as “zoo”. * is equivalent to {0,}.
+ Matches the preceding subexpression one or more times. For example, ‘zo + ‘ would match “zo” as well as “zoo”, but not “z”. + is equivalent to {1,}.
{n} is a non-negative integer. Matches exactly n times. For example, ‘o{2}’ would not match the ‘o’ in “Bob”, but would match both o’s in “food”.
{n,m} m and n are both non-negative integers, where n <= m. Matches at least n times and at most m times.
# query names start with l and end with i
select * from stu where name regexp '^l|i$'

SELECT 'hello' REGEXP '^he' Result: 1 means match
SELECT 'hello' REGEXP '^hh' Result: 0 means no match

Field control

Delete duplicate records

Remove duplicate records (the data on the series in two or more rows of records are the same), for example, there are identical records in the sal field in the emp table. When only the sal field of the emp table is queried, there will be duplicate records. If you want to remove duplicate records, you need to use DISTINCT:

SELECT DISTINCT sal FROM emp;

Calculations between columns

View the sum of the employee’s monthly salary and commission. Since the sal and comm columns are both numeric types, they can be added. If there is a field in sal or comm that is not a numeric type, an error will occur.

SELECT *,sal + comm FROM emp;

The value of many records in the comm column is NULL, because the result of adding anything to NULL is still NULL, so the settlement result may appear NULL. The function IFNULL that converts NULL to the value 0 is used below

SELECT *,sal + IFNULL(comm,0) FROM emp;

Add an alias to the column name

In the above query, the column name is sal + IFNULL(comm,0), which is very unsightly. Now we give this column an alias, which is total:

SELECT *, sal + IFNULL(comm,0) AS total FROM emp;

When aliasing a column, the AS keyword can be omitted:

SELECT *,sal + IFNULL(comm,0) total FROM emp;

Sort results

  1. Query all student records, sorted by age in ascending order

    SELECT * FROM stu ORDER BY sage ASC;
    # or
    SELECT * FROM stu ORDER BY sage;
    
  2. Query all student records, sorted by age in descending order

    SELECT * FROM stu ORDER BY age DESC;
    
  3. Query all employees, sort by monthly salary in descending order, if the monthly salary is the same, sort by number in ascending order

    SELECT * FROM emp ORDER BY sal DESC,empno ASC;
    

Aggregation function

An aggregate function is a function that acts on a column of data and operates on a column of data. Contains: max, min, sum, count, avg and other common functions.

  • max(): Calculate the maximum value of the specified column data
  • min(): Calculate the minimum value of the specified column data
  • count(): Calculate the number of data in the specified column that is not NULL
  • sum(): Calculate the sum of the values of the specified column, if the type of the calculated column is not a numeric type, the calculation result is 0
  • avg(): Calculate the average value of the values of the specified column, if the type of the calculated column is not a numeric type, the calculated result is 0

The method of use is as follows:

max

-- used to calculate the maximum value of the specified column
-- Calculate the highest salary
select max(sal) from emp;

min

-- used to calculate the minimum value of the specified column
-- Calculate the minimum wage
select min(sal) from emp;

count

-- Used to count the number of data in the specified column, note that NULL will not be counted
-- 1. Calculate how many people in the emp table have salary sal
select count(sal) from emp;
-- 2. Calculate how many rows of data there are in the emp table
select count(*) from emp;

-- count(*) : Used to count row records, as long as there is this row, it will be counted, even if all the field values of this row are NULL, it is still a valid row

sum

-- Used to count the sum of the data in the specified column, note that NULL will not be counted
-- Calculate the sum of salaries in the emp table
select sum(sal) from emp;

avg

-- Used to count the average value of the data in the specified column, note that NULL will not be counted
-- Calculate the average salary in the emp table
select avg(sal) from emp;

Notice:

In the above requirements, we need to count the average salary of employees. However, in some rows of data, the value corresponding to salary (sal) is NULL.

For example: There are 20 rows of data in the table, and 2 rows of data are NULL. Then when the average is calculated, everyone’s salary will be added together, and this sum will be divided by 18 instead of 20. Because aggregate functions do not count NULL values.

If the requirement needs to distribute this sum to everyone, including NULL rows, then this SQL statement needs to be modified:

select avg(ifnull(sal, 0)) from emp;

Group query

When querying, you can group by one or more fields. Rows with the same group field value are considered a group. In general, the meaning of grouping is to perform aggregated statistics on the data of each group, such as counting the number and maximum value of each group.

Note: The query fields can only contain grouping fields and aggregate functions

group by

-- Query the number of each department and the maximum salary of this department (sal)
select deptno, max(sal) from emp group by deptno;

-- Query the name of each job and the number of people in this job
select job, count(*) from emp group by job;

-- Query the number of people in each department and each job
select deptno, job, count(*) from emp group by deptno, job;

having

having is a control condition for data filtering, similar to where, but different from where:

  1. Having is applied to the data after grouping, and where is applied to the data before grouping. The data filtered out by where does not participate in the grouping.

    Writing method reflects: having needs to be written after group by, where needs to be written before group by.

  2. Aggregate functions can be used after having, but aggregate functions cannot be used where.

-- Query the department numbers and average wages whose average salary is higher than 3000
select deptno, avg(sal) from emp group by deptno having avg(sal) > 3000;

imit

The select query statement will query all the data that meets the conditions in a table. The limit keyword can limit the number of rows in the query result.

-- Query the data of row 0 and row 5 in the emp table.
select * from emp limit 0, 5;

-- Query the data of row 7 starting from row 10 in the emp table.
select * from emp limit 10, 7;

Flexible use of limit can achieve the effect of pagination query.

-- For example: I need to display the data in the database on a page, but the size of the page is limited, and I need to display 20 pieces of data on each page.
-- Data for the first page:
select * from news limit 0, 20;
-- Data on the second page:
select * from news limit 20, 20;
-- Data on the third page:
select * from news limit 40, 20;

-- For the content of each page in the future, you only need to control the starting point of each limit.

Query Summary

Query statement writing order

select – from – where – group by – having – order by – limit

Query statement execution order

from-where-group by-having-select-order by-limit