The practice of auto-incrementing primary keys in MySQL migration to Oracle scenarios

This article verifies and summarizes two methods of implementing auto-incrementing primary keys in OB Oracle.

Author: Zhao Liming, a member of the MySQL DBA team of Aikesheng. He is familiar with databases such as Oracle and MySQL. He is good at diagnosing database performance problems and analyzing transaction and lock problems. He is responsible for handling the daily operation and maintenance of customer MySQL and our company’s self-developed DMP platform. Question, I am very interested in open source database related technologies.

Produced by the Aikeson open source community. Original content may not be used without authorization. Please contact the editor and indicate the source for reprinting.

This article is about 3,200 words long and should take about 10 minutes to read.

Background

In the scenario of migrating MySQL to OB Oracle, it is usually necessary to consider how to implement the auto-incrementing primary key in OB Oracle. There have been articles in the community that have given some solutions.

This article will verify and summarize a relatively feasible implementation plan from the perspective of solving practical problems.

Plan 1

We will implement auto-incrementing primary keys by creating a custom sequence.

MySQL-side creates a test table

[email protected] [zlm]> desc t;
 + ------- + ------------- + ------ + ----- + ---------- + ---- ------------ +
| Field | Type | Null | Key | Default | Extra |
 + ------- + ------------- + ------ + ----- + ---------- + ---- ------------ +
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| name | varchar(10) | YES | | NULL | |
 + ------- + ------------- + ------ + ----- + ---------- + ---- ------------ +
2 rows in set (0.00 sec)

[email protected] [zlm]> select * from t;
 + ---- + ------ +
| id | name |
 + ---- + ------ +
| 1 | a |
| 2 | b |
| 3 | c |
 + ---- + ------ +
3 rows in set (0.00 sec)

Use DBCAT to export table structure

DBCAT is a command line tool provided by OB. It is mainly used for DDL export and conversion of non-table objects in heterogeneous database migration scenarios, such as sequences, functions, stored procedures, packages, triggers, views and other objects in Oracle.

cd /opt/oceanbase_package/tools/dbcat-1.9.1-RELEASE/bin
./dbcat convert -H 10.186.60.68 -P 3332 --user=zlm --password=zlm --database=zlm --no-schema --no-quote --from mysql57 --to oboracle32x --table t - -file=/tmp
Parsed args:
[--no-quote] true
[--no-schema] true
[--table] [t]
[--host] 10.186.60.68
[--port] 3332
[--user] zlm
[--password]******
[--database] zlm
[--file] /tmp
[--from] mysql57
[--to] oboracle32x
2023-08-16 14:41:58 INFO Init convert config finished.
2023-08-16 14:41:58 INFO {dataSource-1} inited
2023-08-16 14:41:58 INFO Init source druid connection pool finished.
2023-08-16 14:41:58 INFO Register c.o.o.d.m.c.m.MySql56ObOracle22xColumnConverter
2023-08-16 14:41:58 INFO Register c.o.o.d.m.c.m.MySql56ObOracle22xIndexConverter
2023-08-16 14:41:58 INFO Register c.o.o.d.m.c.m.MySql56ObOracle22xPrimaryKeyConverter
2023-08-16 14:41:58 INFO Register c.o.o.d.m.c.m.MySql56ObOracle22xUniqueKeyConverter
2023-08-16 14:41:58 INFO Register c.o.o.d.m.c.m.MySql56ObOracle22xPartitionConverter
2023-08-16 14:41:59 INFO Load meta/mysql/mysql56.xml, meta/mysql/mysql57.xml successful
2023-08-16 14:42:09 INFO Query 0 dependencies elapsed 17.35 ms
2023-08-16 14:42:09 INFO Query table: "t" attr finished. Remain: 0
2023-08-16 14:42:09 INFO Query 1 tables elapsed 69.71 ms
2023-08-16 14:42:09 WARN Include types is empty. Ignore schema: ZLM
2023-08-16 14:42:09 WARN Skip to compare/convert sequences as SEQUENCE is unsupported
2023-08-16 14:42:09 INFO Starting to convert schema to path: "/tmp/dbcat-2023-08-16-144209/ZLM"
2023-08-16 14:42:09 INFO Successed to generate report in the path: "/tmp/dbcat-2023-08-16-144209/ZLM-conversion.html"
2023-08-16 14:42:09 INFO {dataSource-1} closing ...
2023-08-16 14:42:09 INFO {dataSource-1} closed
cd /tmp/dbcat-2023-08-16-144209/ZLM
cat TABLE-schema.sql
CREATE TABLE t (
    id NUMBER(19,0),
    name VARCHAR2(30 BYTE),
    CONSTRAINT PRIMARY PRIMARY KEY (id)
);

-- CREATE SEQUENCE xxx START WITH 1 INCREMENT BY 1 ... for t

DBCAT will convert the table structure of the target table to conform to Oracle’s syntax, and write a line of pseudo-SQL to create a sequence in the exported DDL statement. It can be seen that this tool also recommends creating a sequence to process MySQL auto-increment columns.

Create sequence on OB target

  • For each table that contains an auto-increment column, a sequence needs to be created to correspond to it.
  • When creating a sequence, it is recommended to name it in the format of SEQ_
    _.
  • When the CYCLE and ORDER attributes of the sequence are not specified, their default values are N , that is, no looping and no sorting.
  • When the sequence’s CACHE attribute is not specified, 20 sequences are cached by default.
  • The field MIN_VALUE corresponds to the value of the MIN_VALUE attribute when the sequence was created.
  • The field LAST_NUMBER corresponds to the value of the START WITH attribute when creating the sequence.
  • ZLM[ZLM]> CREATE SEQUENCE SEQ_T_ID MINVALUE 1 MAXVALUE 999999 INCREMENT BY 1 START WITH 1;
    Query OK, 0 rows affected (0.03 sec)
    
    ZLM[ZLM]> SELECT SEQUENCE_NAME,MIN_VALUE,LAST_NUMBER,CYCLE_FLAG,ORDER_FLAG,CACHE_SIZE FROM DBA_SEQUENCES WHERE SEQUENCE_OWNER='ZLM';
     + --------------- + ----------- + ------------- + ------- ----- + ------------ + ------------ +
    | SEQUENCE_NAME | MIN_VALUE | LAST_NUMBER | CYCLE_FLAG | ORDER_FLAG | CACHE_SIZE |
     + --------------- + ----------- + ------------- + ------- ----- + ------------ + ------------ +
    | SEQ_T_ID | 1 | 1 | N | N | 20 |
     + --------------- + ----------- + ------------- + ------- ----- + ------------ + ------------ +
    
    1 row in set (0.01 sec)
    
    ZLM[ZLM]> drop sequence SEQ_T_ID;
    Query OK, 0 rows affected (0.03 sec)
    
    ZLM[ZLM]> CREATE SEQUENCE SEQ_T_ID MINVALUE 1 MAXVALUE 999999 INCREMENT BY 1 START WITH 10;
    Query OK, 0 rows affected (0.03 sec)
    
    ZLM[ZLM]> SELECT SEQUENCE_NAME,MIN_VALUE,LAST_NUMBER,CYCLE_FLAG,ORDER_FLAG,CACHE_SIZE FROM DBA_SEQUENCES WHERE SEQUENCE_OWNER='ZLM';
     + --------------- + ----------- + ------------- + ------- ----- + ------------ + ------------ +
    | SEQUENCE_NAME | MIN_VALUE | LAST_NUMBER | CYCLE_FLAG | ORDER_FLAG | CACHE_SIZE |
     + --------------- + ----------- + ------------- + ------- ----- + ------------ + ------------ +
    | SEQ_T_ID | 1 | 10 | N | N | 20 |
     + --------------- + ----------- + ------------- + ------- ----- + ------------ + ------------ +
    
    1 row in set (0.03 sec)

    Create table on OB target end

    Based on the DDL obtained in step 1 and the sequence created in step 2.

    ZLM[ZLM]> CREATE TABLE "ZLM"."T" (
      -> "ID" NUMBER(19,0) DEFAULT SEQ_T_ID.NEXTVAL,
      -> "NAME" VARCHAR2(30 BYTE),
      -> CONSTRAINT "PRIMARY" PRIMARY KEY ("ID"));
    Query OK, 0 rows affected (0.15 sec)

    Usually the table structure and data are migrated through OMS. It is rare to directly use the DDL table creation statement generated by DBCAT to manually create a table on the target side, except for some more special scenarios, such as the above to add default attributes to fields. For the case of serial values.

    When creating a table, please note:

    1. Change table and field names to uppercase because Oracle strictly distinguishes case between database objects.

    2. DEFAULT value of the ID column, specified as the next value in the sequence created in step 2, i.e.: SEQ_T_ID.NEXTVAL

    Use DataX to migrate data

    DataX is Alibaba’s open source offline data synchronization tool that supports a variety of heterogeneous data sources. Data migration between OB and heterogeneous databases can be achieved through OB’s Reader and Writer plug-ins.

    -- Create DataX configuration file (stored in the ./job directory of dataX)
    cat t.json
    {
      "job": {
        "setting": {
          "speed": {
            "channel": 4
          },
          "errorLimit": {
            "record": 0,
            "percentage": 0.1
          }
        },
        "content": [
          {
            "reader": {
              "name": "mysqlreader",
              "parameter": {
                "username": "zlm",
                "password": "zlm",
                "column": [
                  "*"
                ],
                "connection": [
                  {
                    "table": [
                      "t"
                    ],
                    "jdbcUrl": ["jdbc:mysql://10.186.60.68:3332/zlm?useUnicode=true & amp;characterEncoding=utf8"]
                  }
                ]
              }
            },
            "writer": {
              "name": "oceanbasev10writer",
              "parameter": {
                "obWriteMode": "insert",
                "column": [
                  "*"
                ],
                "preSql": [
                  "truncate table T"
                ],
                "connection": [
                  {
                    "jdbcUrl": "||_dsc_ob10_dsc_||jingbo_ob:ob_oracle||_dsc_ob10_dsc_||jdbc:oceanbase://10.186.65.22:2883/ZLM?useLocalSessionState=true & amp;allowBatch=true & amp;allowMultiQueries=true &rewriteBatchedStatements=true",
                    "table": [
                      "T"
                    ]
                  }
                ],
                "username": "ZLM",
                "password":"zlm",
                "writerThreadCount":10,
                "batchSize": 1000,
                "memstoreThreshold": "0.9"
              }
            }
          }
        ]
      }
    }
    
    -- Perform data migration
    ./bin/datax.py job/t.json
    DataX (20220610-external), From Alibaba!
    Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
    full db is not specified.
    schema sync is not specified.
    java -server -Xms4g -Xmx16g -XX: + HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/admin/datax3/log -DENGINE_VERSION=20220610-external -Xms4g -Xmx16g -XX: + HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/admin/ datax3/log -Dloglevel=info -Dproject.name=di-service -Dfile.encoding=UTF-8 -Dlogback.statusListenerClass=ch.qos.logback.core.status.NopStatusListener -Djava.security.egd=file:// /dev/urandom -Ddatax.home=/home/admin/datax3 -Dlogback.configurationFile=/home/admin/datax3/conf/logback.xml -classpath /home/admin/datax3/lib/*:. -Dlog.file .name=in_datax3_job_t_json com.alibaba.datax.core.Engine -mode standalone -jobid -1 -job /home/admin/datax3/job/t.json -fulldb false -schema false
    2023-08-16 14:58:41.088 [main] INFO Engine - running job from /home/admin/datax3/job/t.json
    2023-08-16 14:58:41.374 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
    2023-08-16 14:58:41.382 [main] INFO Engine - the machine info =>
    slightly...
     2. record average count and max count task info:
    PHASE | AVERAGE RECORDS | AVERAGE BYTES | MAX RECORDS | MAX RECORD`S BYTES | MAX TASK ID | MAX TASK INFO
    READ_TASK_DATA | 3 | 6B | 3 | 6B | 0-0-0 | t,jdbcUrl:[jdbc:mysql://10.186.60.68:3332/zlm]
    2023-08-16 14:58:45.189 [job-0] INFO MetricReportUtil - reportJobMetric is turn off
    2023-08-16 14:58:45.189 [job-0] INFO StandAloneJobContainerCommunicator - Total 3 records, 6 bytes | Speed 3B/s, 1 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
    2023-08-16 14:58:45.190 [job-0] INFO LogReportUtil - report datax log is turn off
    2023-08-16 14:58:45.190 [job-0] INFO JobContainer -
    Task start time: 2023-08-16 14:58:41
    Mission end time: 2023-08-16 14:58:45
    Total task time: 3s
    Average task traffic: 3B/s
    Record writing speed: 1rec/s
    Total number of records read: 3
    Total number of read and write failures: 0
    2023-08-16 14:58:45.190 [job-0] INFO PerfTrace - reset PerfTrace.

    Verification effect

    Verify whether the primary key column can be auto-incremented.

    SYS[ZLM]> select * from t;
     + ---- + ------ +
    | ID | NAME |
     + ---- + ------ +
    | 1 | a |
    | 2 | b |
    | 3 | c |
     + ---- + ------ +
    3 rows in set (0.01 sec)
    
    SYS[ZLM]> insert into t(name) values('d');
    Query OK, 1 row affected (0.02 sec)
    
    SYS[ZLM]> select * from t;
     + ---- + ------ +
    | ID | NAME |
     + ---- + ------ +
    | 1 | a |
    | 2 | b |
    | 3 | c |
    | 4 | d |
     + ---- + ------ +
    4 rows in set (0.00 sec)

    The newly inserted data will first obtain the NEXTVAL value of the sequence on the ID column every time, thus realizing the requirement of auto-increment of the primary key.

    After using the NEXTVAL of the custom sequence as the DEFAULT value of the primary key column, you do not need to care about the maximum value of the auto-increment column recorded on the source table. After the table is migrated, when new data is directly inserted, it will not conflict with the original data.

    Plan 2

    Is it easy to use the GENERATED BY DEFAULT AS IDENTITY attribute to generate sequences? Let’s take a look at a test first!

    --Drop and rebuild the test table
    ZLM[ZLM]> DROP TABLE T;
    Query OK, 0 rows affected (0.10 sec)
    ZLM[ZLM]> CREATE TABLE "ZLM"."T" (
      -> "ID" NUMBER(19,0) GENERATED BY DEFAULT AS IDENTITY MINVALUE 1 MAXVALUE 999999,
      -> "NAME" VARCHAR2(30 BYTE),
      -> CONSTRAINT "PRIMARY" PRIMARY KEY ("ID"));
    Query OK, 0 rows affected (0.15 sec)
    
    -- View sequence
    ZLM[ZLM]> SELECT SEQUENCE_NAME,MIN_VALUE,LAST_NUMBER,CYCLE_FLAG,ORDER_FLAG,CACHE_SIZE FROM DBA_SEQUENCES WHERE SEQUENCE_OWNER='ZLM';
     + -------------- + ----------- + ------------- + ----- ------- + ------------ + ------------ +
    | SEQUENCE_NAME | MIN_VALUE | LAST_NUMBER | CYCLE_FLAG | ORDER_FLAG | CACHE_SIZE |
     + -------------- + ----------- + ------------- + ----- ------- + ------------ + ------------ +
    | SEQ_T_ID | 1 | 21 | N | N | 20 |
    | ISEQ$$_50034_16 | 1 | 1 | N | N | 20 |
     + -------------- + ----------- + ------------- + ----- ------- + ------------ + ------------ +
    2 rows in set (0.00 sec)
    
    # At this point, the system automatically created a sequence named ISEQ$$_50034_16, and other default values are consistent with the custom-created sequence.
    
    -- View table structure
    ZLM[ZLM]> desc t;
     + ------- + -------------- + ------ + ----- + ------------- ----- + ------- +
    | FIELD | TYPE | NULL | KEY | DEFAULT | EXTRA |
     + ------- + -------------- + ------ + ----- + ------------- ----- + ------- +
    | ID | NUMBER(19) | NO | PRI | SEQUENCE.NEXTVAL | NULL |
    | NAME | VARCHAR2(30) | YES | NULL| NULL | NULL |
     + ------- + -------------- + ------ + ----- + ------------- ----- + ------- +
    2 rows in set (0.02 sec)
    
    # Note that the DEFAULT value of the ID column is SEQUENCE.NEXTVAL, not ISEQ$$_50034_16.NEXTVAL
    
    -- Reimport data
    ./bin/datax.py job/t.json
    slightly ...
    
    --Insert data
    ZLM[ZLM]> insert into t(name) values('d');
    ORA-00001: unique constraint '1' for key 'PRIMARY' violated
    
    ZLM[ZLM]> insert into t(name) values('d');
    ORA-00001: unique constraint '2' for key 'PRIMARY' violated
    
    ZLM[ZLM]> insert into t(name) values('d');
    ORA-00001: unique constraint '3' for key 'PRIMARY' violated
    
    ZLM[ZLM]> insert into t(name) values('d');
    Query OK, 1 row affected (0.01 sec)
    
    ZLM[ZLM]> select "ISEQ$$_50034_16".CURRVAL from dual;
     + --------- +
    |CURRVAL|
     + --------- +
    | 4 |
     + --------- +
    
    1 row in set (0.00 sec)

    There are 3 pieces of data in the table. When inserting, START WITH actually starts from the default value 1.

    Each time an insertion is performed, the ID will obtain the NEXTVAL value of the sequence. The insertion cannot be successful until the fourth execution and no conflict with existing records in the table.

    For this scenario, there are two solutions, but both are relatively cumbersome:

    1. Before inserting records, first obtain the NEXTVAL value of the current sequence (it needs to be executed multiple times, the number of executions = the number of records in the source table).
    2. When creating the sequence, specify the START WITH attribute based on the maximum value of the auto-increment column on the source table.

    Notice:

    Sequences generated using the GENERATED BY DEFAULT AS IDENTITY attribute cannot be deleted directly and an error will be reported.

    ORA-32794: cannot drop a system-generated sequence

    Get the maximum value of the auto-increment column in the table

    Method 1: MAX function

    [email protected] [zlm]> SELECT MAX(id) + 1 as AUTO_INCREMENT FROM t;
     +----------------+
    | AUTO_INCREMENT |
     +----------------+
    | 4 |
     +----------------+
    
    1 row in set (0.00 sec)

    Method 2: System View

    [email protected] [zlm]> select AUTO_INCREMENT from information_schema.tables where table_name='t';
     +----------------+
    | AUTO_INCREMENT |
     +----------------+
    | 4 |
     +----------------+
    
    1 row in set (0.00 sec)

    Method 3: show create table command

    [email protected] [zlm]> show create table t\G
    *************************** 1. row ********************* *******
        Table: t
    Create Table: CREATE TABLE `t` (
     `id` bigint(20) NOT NULL AUTO_INCREMENT,
     `name` varchar(10) DEFAULT NULL,
     PRIMARY KEY (`id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)

    An example of script initialization sequence

    --Drop and rebuild table
    ZLM[ZLM]> drop table t;
    Query OK, 0 rows affected (0.02 sec)
    
    ZLM[ZLM]> CREATE TABLE "ZLM"."T" (
        -> "ID" NUMBER(19,0) GENERATED BY DEFAULT AS IDENTITY MINVALUE 1 MAXVALUE 999999,
        -> "NAME" VARCHAR2(30 BYTE),
        -> CONSTRAINT "PRIMARY" PRIMARY KEY ("ID"));
    Query OK, 0 rows affected (0.04 sec)
    
    -- Import Data
    ./bin/datax.py job/t.json
    slightly ...
    
    --Execute the script and confirm that the returned results are normal
    [root@10-186-65-73 ~]# cat init_sequence.sh
    #!/bin/bash
    
    ## Get the maximum value of the auto-increment column in the current table
    i=$(obclient -h10.186.60.68 -P3332 -uzlm -pzlm -Nse "SELECT MAX(id) + 1 FROM zlm.t;" 2>/dev/null | head -1)
    
    ## Loop to execute SQL initialization sequence value
    for ((j=1; j<=$i; j + + ))
    do
     obclient -h10.186.65.43 -P2883 -uZLM@ob_oracle#bobo_ob:1675327512 -pzlm -A -c -DZLM -Nse "select ISEQ\$\$_50037_16.nextval from dual;" 1>/dev/ null 2> &1
    done
    [root@10-186-65-73 ~]# sh init_sequence.sh
    [root@10-186-65-73 ~]# echo $?
    0
    
    --Execute insert
    ZLM[ZLM]> insert into t(name) values('d');
    Query OK, 1 row affected (0.01 sec)
    

    After the sequence has been initialized, when the data import is completed and new records are directly inserted, there will no longer be an error reporting a uniqueness conflict. Similarly, first use a script to obtain the maximum value of the auto-increment column, and specify START WITH to be consistent with the maximum value of the auto-increment column when creating the sequence. This can also solve the above problem, which will not be expanded here.

    In general, the sequence created by the GENERATED BY DEFAULT AS IDENTITY attribute (Option 2) is not as easy to use as the custom sequence (Option 1).

    Summary

    This article verifies and illustrates two methods of implementing auto-incrementing primary keys in OB Oracle: creating a custom sequence and generating a sequence using the GENERATED BY DEFAULT AS IDENTITY attribute.

    Plan 1

    When creating a custom sequence, you need to create a separate sequence for each table with auto-increment columns. It is recommended that the sequence name be associated with the table name, but there is no need to pay attention to the value of START WITH. When a new record is inserted, the next available sequence value is automatically obtained.

    Plan 2

    There are some limitations when generating sequences using the GENERATED BY DEFAULT AS IDENTITY property:

    • Because the sequence is automatically created and managed by the system, you need to query the system view to obtain the sequence name, which cannot correspond to the business table name.
    • When creating a sequence, you need to specify the value of START WITH based on the maximum value of the auto-increment column. This is more cumbersome when there are a large number of tables to be processed.

    The sequence name generated using the GENERATED BY DEFAULT AS IDENTITY attribute has an internal counter that will increase cumulatively. Even if the original sequence is deleted, the original name will not be reused. When the table is deleted, the sequence generated by the GENERATED BY DEFAULT AS IDENTITY attribute will be automatically cleaned up (direct deletion of the sequence will result in an error), but it will not affect other previously created custom sequences.

    When using the GENERATED BY DEFAULT AS IDENTITY attribute to generate sequences, additional consideration must be given to the current maximum value of the auto-increment column in the source table to be migrated, which undoubtedly increases the complexity of migration.

    In summary, it is more recommended to use a custom sequence to implement an auto-incrementing primary key.

    For more technical articles, please visit: https://opensource.actionsky.com/

    About SQLE

    SQLE from the Axon open source community is a SQL audit tool for database users and managers that supports multi-scenario audits, standardized online processes, native support for MySQL audits, and scalable database types.

    SQLE get

    Type Address
    Repository https://github.com/actiontech/sqle
    Documentation https://actiontech.github.io/sqle-docs/
    Release information https://github.com/actiontech/sqle/releases
    Data review Plug-in development documentation https://actiontech.github.io/sqle-docs/docs/dev-manual/plugins/howtouse
syntaxbug.com © 2021 All Rights Reserved.