Mysql row and column transformation “Everything you want is available”

Directory

      • introduce
      • 1. Prepare test data
      • 2. Change one line to multiple lines according to the delimiter
      • 3. Change one column to multiple columns and simply divide the columns into columns
      • 4. One column becomes multiple columns, data pivot
      • 5. Change multiple columns into one column

Introduction

In interviews and practical problems, we often encounter various row-column conversions, such as data changing from one row to multiple rows according to delimiters, data from one column to multiple columns, and multiple columns to one column. Here we will introduce how to handle these scenarios in Mysql.

1. Prepare test data

create table dwd.fact_suject_data
(
    student_id int null comment 'number',
    subject_level varchar(255) null comment 'subject level',
    subject_level_json varchar(255) null comment 'Subject level json data'
);
insert into dwd.fact_suject_data(student_id, subject_level,subject_level_json) values (12,'China E,English D,Math E','{"China": "E","English": "D","Math": " E"}');
insert into dwd.fact_suject_data(student_id, subject_level,subject_level_json) values (2,'China B,English B','{"China": "B","English": "B"}');
insert into dwd.fact_suject_data(student_id, subject_level,subject_level_json) values (3,'English A,Math C','{"English": "A","Math": "C"}');
insert into dwd.fact_suject_data(student_id, subject_level,subject_level_json) values (4,'China C,Math A','{"China": "C","Math": "A"}');
insert into dwd.fact_suject_data(student_id, subject_level,subject_level_json) values (5,'China D,English A,Math C','{"China": "D","English": "A","Math": " C"}');
insert into dwd.fact_suject_data(student_id, subject_level,subject_level_json) values (6,'China C,English A,Math D','{"China": "C","English": "A","Math": " D"}');
insert into dwd.fact_suject_data(student_id, subject_level,subject_level_json) values (7,'China A,English E,Math B','{"China": "A","English": "E","Math": " B"}');
insert into dwd.fact_suject_data(student_id, subject_level,subject_level_json) values (8,'China D,English E,Math E','{"China": "D","English": "E","Math": " E"}');
insert into dwd.fact_suject_data(student_id, subject_level,subject_level_json) values (9,'China C,English E,Math C','{"China": "C","English": "E","Math": " C"}');

2. One line becomes multiple lines according to the delimiter

Requirement: I want two columns: student number and subject level. The subject level only displays a single subject level.

Method 1: Use the table that comes with mysql to split strings. Note that help_topic_id starts from 0

select t1.student_id,
       -- Split the string according to subject_level according to help_topic_id
       substring_index(substring_index(t1.subject_level, ',', t2.help_topic_id + 1), ',', - 1) subject_level
from dwd.fact_suject_data t1
         left join mysql.help_topic t2
                   on t2.help_topic_id <= length(t1.subject_level) - length(replace(t1.subject_level, ',', ''))
;

Method 2: Use recursion to construct an auto-increasing id column yourself (starting from 1 or 0), which is used to split strings

with recursive cte(n) as
                   (select 1
                    union all
                    select n+1
                    from cte
                    where n < 100) -- Recurse out the required auto-incrementing id table
select t1.student_id,
       -- Split the string according to n on subject_level
       substring_index(substring_index(t1.subject_level, ',', t2.n), ',', -1) subject_level
from dwd.fact_suject_data t1
         left join cte t2
                   on t2.n <= (length(t1.subject_level) - length(replace(t1.subject_level, ',', '')) + 1)
;

The scripts of the two methods are based on the total number of separators, and sequentially construct numbers that increase to the total number, which are used as substring_index() parameters for cutting. In addition to the above two methods, if there are other tables with such increments and unique The id can also be borrowed.

3. One column becomes multiple columns, simple column arrangement

Requirements: I want three columns: student number, subject and grade;

Based on the results of step 2, simply use substring_index() to split;

select t1.student_id,
       -- Split the string according to subject_level according to help_topic_id
       substring_index(substring_index(t1.subject_level, ',', t2.help_topic_id + 1), ',', - 1) subject_level,
       substring_index(substring_index(substring_index(t1.subject_level, ',', t2.help_topic_id + 1), ',', - 1),' ',1) subject,
       substring_index(substring_index(substring_index(t1.subject_level, ',', t2.help_topic_id + 1), ',', - 1),' ',-1) level
from dwd.fact_suject_data t1
         left join mysql.help_topic t2
                   on t2.help_topic_id <= length(t1.subject_level) - length(replace(t1.subject_level, ',', ''))
;

Requirement: I want the student number and the data corresponding to each subject, and I need to use the subject_level_json field;

subject_level_json is a common json data type, so we can use functions related to json. Below are the commonly used ones. If you want to know more, you can click on the Mysql official website to view and expand;

select json_object('key1', 1, 'key2','abc') json_data,
       json_extract('{"key1": 1, "key2": "abc"}','$.key1') key1,
       json_extract('{"key1": 1, "key2": "abc"}','$.key2') key2;

Next, quote directly. If the data type is wrong, you can cast() to convert it. If the value has multiple double quotes, you can replace it with replace().

select student_id,
       subject_level_json,
       replace(cast(json_extract(cast(subject_level_json as json),'$.China') as char),'"','') china,
       replace(cast(json_extract(cast(subject_level_json as json),'$.English') as char),'"','') english,
       replace(cast(json_extract(cast(subject_level_json as json),'$.Math') as char),'"','') math
from fact_suject_dat;

4. One column becomes multiple columns, data pivot

Requirement: I want the student number and the data corresponding to each subject, but the subject_level_json field cannot be used;

After sorting according to step 2, further use case when to convert directly

select student_id,
       max(case when subject = 'China' then level else null end) china,
       max(case when subject = 'English' then level else null end) english,
       max(case when subject = 'Math' then level else null end) math
from (select t1.student_id,
       -- Split the string according to subject_level according to help_topic_id
       substring_index(substring_index(t1.subject_level, ',', t2.help_topic_id + 1), ',', - 1) subject_level,
       substring_index(substring_index(substring_index(t1.subject_level, ',', t2.help_topic_id + 1), ',', - 1),' ',1) subject,
       substring_index(substring_index(substring_index(t1.subject_level, ',', t2.help_topic_id + 1), ',', - 1),' ',-1) level
from dwd.fact_suject_data t1
         left join mysql.help_topic t2
                   on t2.help_topic_id <= length(t1.subject_level) - length(replace(t1.subject_level, ',', ''))) t
group by student_id
;

You can also further convert the subject_level field by combining mid() or substring() and case when. If you don’t know about string functions, click here;

select student_id,
       max(case when subject_level like '%China%'
           then mid(subject_level ,(instr(subject_level,'China') + length('China') + 1), 1) else null end) china,
       max(case when subject_level like '%English%'
            then mid(subject_level,(instr(subject_level,'English') + length('English') + 1), 1) else null end) english,
       max(case when subject_level like '%Math%'
            then mid(subject_level ,(instr(subject_level,'Math') + length('Math') + 1), 1) else null end) math
from fact_suject_data
group by student_i;

5. Change multiple columns into one column

Requirement: I want to generate three columns of data: student number, subject and grade based on the results of step 4;

Just use union all for splicing. If it is Oracle, you can use thePIOVT function.

select student_id,
       concat('China ',china) subject_level
from
(select student_id,
       max(case when subject_level like '%China%'
           then mid(subject_level ,(instr(subject_level,'China') + length('China') + 1), 1) else null end) china,
       max(case when subject_level like '%English%'
            then mid(subject_level,(instr(subject_level,'English') + length('English') + 1), 1) else null end) english,
       max(case when subject_level like '%Math%'
            then mid(subject_level ,(instr(subject_level,'Math') + length('Math') + 1), 1) else null end) math
from fact_suject_data
group by student_id) t
where china is not null
union all
select student_id,
       concat('English ',english) subject_level
from
(select student_id,
       max(case when subject_level like '%China%'
           then mid(subject_level ,(instr(subject_level,'China') + length('China') + 1), 1) else null end) china,
       max(case when subject_level like '%English%'
            then mid(subject_level,(instr(subject_level,'English') + length('English') + 1), 1) else null end) english,
       max(case when subject_level like '%Math%'
            then mid(subject_level ,(instr(subject_level,'Math') + length('Math') + 1), 1) else null end) math
from fact_suject_data
group by student_id) t
where english is not null
union all
select student_id,
       concat('Math ',math) subject_level
from
(select student_id,
       max(case when subject_level like '%China%'
           then mid(subject_level ,(instr(subject_level,'China') + length('China') + 1), 1) else null end) china,
       max(case when subject_level like '%English%'
            then mid(subject_level,(instr(subject_level,'English') + length('English') + 1), 1) else null end) english,
       max(case when subject_level like '%Math%'
            then mid(subject_level ,(instr(subject_level,'Math') + length('Math') + 1), 1) else null end) math
from fact_suject_data
group by student_id) t
where math is not null;

Need: I want…