100 million pieces of data are inserted into MySQL in batches, which method is the fastest?

Using JAVA to insert 100 million data into Mysql – efficiency evaluation

When researching the query efficiency in mysql optimization in the past few days, I found that the test data is too small (100,000 levels). Using EXPLAIN to compare different SQL statements, I can’t get more effective evaluation data, most of them are ambiguous, and I dare not make a conclusion based on these data .

Therefore, by randomly generating a person’s name, age, gender, phone number, email, and address, a large amount of data is inserted into the mysql database, which is convenient for testing the efficiency of SQL statement optimization with a large amount of data. , During the generation process, it was found that different methods are used, and the efficiency varies greatly.

1. Go to the Mysql database first, and randomly generate personnel data maps. They are ID, name, gender, age, Email, phone number, and address.

The figure below has a total of 33 million data:

When the amount of data is at the level of 100 million, do not click the button below, which will cause Navicat to continue loading the data at the level of 100 million, causing the computer to crash. ~If you feel that your computer configuration is good, you can try it, and you may be pleasantly surprised

2. In this evaluation, a total of three strategies and five situations were passed to conduct a large-scale data insertion test

The strategies are:

  • Mybatis lightweight framework insertion (no transaction)
  • Use JDBC to process directly (open transaction, no transaction)
  • Use JDBC batch processing (open transaction, no transaction)

Test Results:

Mybatis lightweight insertion -> JDBC direct processing -> JDBC batch processing.

JDBC batch processing, the highest efficiency

The first strategy test:

2.1 Mybatis lightweight framework insertion (no transaction)

Mybatis is a lightweight framework, which is lighter and more efficient than hibernate.

However, when processing a large number of data insertion operations, it is necessary to implement an ORM conversion in the process. There are instances in this test, and the transaction is not opened, resulting in the general efficiency of mybatis.

The content of the experiment here is:

  • Use the Spring framework to generate mapper instances and create character instance objects
  • Loop changes the properties of the instance object and inserts.
//There is no transaction in the code

 private long begin = 33112001;//start id
    private long end = begin + 100000;//The amount of data inserted in each loop
    private String url = "jdbc:mysql://localhost:3306/bigdata?useServerPrepStmts=false &rewriteBatchedStatements=true &useUnicode=true &characterEncoding=UTF-8";
    private String user = "root";
    private String password = "0203";
    
    
@org.junit.Test
    public void insertBigData2()
    {
        //Load Spring and get the PersonMapper instance object. The time of creation here does not have a great impact on the final result
        ApplicationContext context = new ClassPathXmlApplicationContext("applicationContext.xml");
        PersonMapper pMapper = (PersonMapper) context. getBean("personMapper");
        //Create a person instance
        Person person = new Person();
        //count start time
        long bTime = System. currentTimeMillis();
        //Start the loop, the number of loops is 500W times.
        for(int i=0;i<5000000;i++)
        {
            //Assign a value to person
            person. setId(i);
            person.setName(RandomValue.getChineseName());
            person.setSex(RandomValue.name_sex);
            person.setAge(RandomValue.getNum(1, 100));
            person.setEmail(RandomValue.getEmail(4,15));
            person.setTel(RandomValue.getTel());
            person.setAddress(RandomValue.getRoad());
            //execute insert statement
            pMapper.insert(person);
            begin++;
        }
        // count end time
        long eTime = System. currentTimeMillis();
        System.out.println("Time-consuming to insert 500W pieces of data:" + (eTime-bTime));
    }

I wanted to test inserting 5 million pieces of data, but the actual running process was too slow, and the program had to be terminated halfway. Finally, 52W data is obtained, which takes about two songs (7~9 minutes). Then, use mybatis to insert 10000 data into mysql.

The result is as follows:

Using mybatis to insert 10,000 pieces of data takes time: 28613, or 28.6 seconds

The second strategy test:

2.2 Use JDBC to process directly (open transaction, close transaction)

Using the strategy of JDBC direct processing, the experimental content here is divided into two types: open transaction and unopened transaction. The process is as follows:

  • Precompile with PreparedStatment
  • Loop, insert corresponding data, and store in

How much impact do transactions have on inserting data? See the experimental results below:

//This code is to open the transaction
 private long begin = 33112001;//start id
    private long end = begin + 100000;//The amount of data inserted in each loop
    private String url = "jdbc:mysql://localhost:3306/bigdata?useServerPrepStmts=false &rewriteBatchedStatements=true &useUnicode=true &characterEncoding=UTF-8";
    private String user = "root";
    private String password = "0203";
 
 
@org.junit.Test
    public void insertBigData3() {
        //Define connection and statement objects
        Connection conn = null;
        PreparedStatement pstm = null;
        try {
            //load jdbc driver
            Class.forName("com.mysql.jdbc.Driver");
            //Connect to mysql
            conn = DriverManager. getConnection(url, user, password);
             // turn autocommit off
             conn.setAutoCommit(false);
            //write sql
            String sql = "INSERT INTO person VALUES (?,?,?,?,?,?,?)";
            //precompile sql
            pstm = conn. prepareStatement(sql);
            // start total time
            long bTime1 = System. currentTimeMillis();
            
            // Loop 10 times, 10,000 data each time, a total of 100,000
            for(int i=0;i<10;i ++ ) {
                //Enable split timing, count 1W data time consumption
                long bTime = System. currentTimeMillis();
                //start loop
                while (begin < end) {
                    //assignment
                    pstm.setLong(1, begin);
                    pstm.setString(2, RandomValue.getChineseName());
                    pstm.setString(3, RandomValue.name_sex);
                    pstm.setInt(4, RandomValue.getNum(1, 100));
                    pstm.setString(5, RandomValue.getEmail(4, 15));
                    pstm.setString(6, RandomValue.getTel());
                    pstm.setString(7, RandomValue.getRoad());
                    //execute sql
                    pstm. execute();
                    begin++;
                }
                //commit transaction
                conn.commit();
                //Boundary value increases by 10W
                end += 10000;
                //Turn off split timer
                long eTime = System. currentTimeMillis();
                // output
                System.out.println("Time-consuming to successfully insert 1W pieces of data:" + (eTime-bTime));
            }
            // close total
            long eTime1 = System. currentTimeMillis();
            // output
            System.out.println("Total time spent inserting 10W data:" + (eTime1-bTime1));
        } catch (SQLException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e1) {
            e1.printStackTrace();
        }
    }

1. We first use the above code to test how long it takes to insert 10W pieces of data in a no-transaction state.

As shown in the picture:

Time-consuming to successfully insert 1W pieces of data: 21603
Time-consuming to successfully insert 1W pieces of data: 20537
Time-consuming to successfully insert 1W pieces of data: 20470
Time-consuming to successfully insert 1W pieces of data: 21160
Time-consuming to successfully insert 1W pieces of data: 23270
Time-consuming to successfully insert 1W pieces of data: 21230
Time-consuming to successfully insert 1W pieces of data: 20372
Time-consuming to successfully insert 1W pieces of data: 22608
Time-consuming to successfully insert 1W pieces of data: 20361
Time-consuming to successfully insert 1W pieces of data: 20494
Total time spent inserting 10W data: 212106

The conclusion of the experiment is as follows:

When no transaction is opened, an average of 10,000 data is inserted every 21.2 seconds.

Then we test that after opening the transaction, it takes time to insert 100,000 pieces of data, as shown in the figure:

Time-consuming to successfully insert 1W pieces of data: 4938
Time-consuming to successfully insert 1W pieces of data: 3518
Time-consuming to successfully insert 1W pieces of data: 3713
Time-consuming to successfully insert 1W pieces of data: 3883
Time-consuming to successfully insert 1W pieces of data: 3872
Time-consuming to successfully insert 1W pieces of data: 3873
Time-consuming to successfully insert 1W pieces of data: 3863
Time-consuming to successfully insert 1W pieces of data: 3819
Time-consuming to successfully insert 1W pieces of data: 3933
Time-consuming to successfully insert 1W pieces of data: 3811
Total time spent inserting 10W data: 39255

The conclusion of the experiment is as follows:

After opening the transaction, an average of 10,000 data is inserted every 3.9 seconds

The third strategy test:

2.3 Use JDBC batch processing (open transaction, no transaction)

When using JDBC batch processing, you need to pay attention to the following points:

1. Batch processing and precompilation need to be enabled when connecting to the URL

String url = "jdbc:mysql://localhost:3306/User?rewriteBatched
-Statements=true &useServerPrepStmts=false”;

2. The PreparedStatement preprocessing sql statement must be placed outside the loop

code show as below:

private long begin = 33112001;//start id
private long end = begin + 100000;//The amount of data inserted in each cycle
private String url = "jdbc:mysql://localhost:3306/bigdata?useServerPrepStmts=false &rewriteBatchedStatements=true &useUnicode=true &characterEncoding=UTF-8";
private String user = "root";
private String password = "0203";


@org.junit.Test
public void insertBigData() {
    //Define connection and statement objects
    Connection conn = null;
    PreparedStatement pstm = null;
    try {
        //load jdbc driver
        Class.forName("com.mysql.jdbc.Driver");
        //Connect to mysql
        conn = DriverManager. getConnection(url, user, password);
  // turn autocommit off
  // conn.setAutoCommit(false);
        //write sql
        String sql = "INSERT INTO person VALUES (?,?,?,?,?,?,?)";
        //precompile sql
        pstm = conn. prepareStatement(sql);
        // start total time
        long bTime1 = System. currentTimeMillis();

        //Loop 10 times, each time 100,000 data, a total of 10 million
        for(int i=0;i<10;i ++ ) {

            //Enable split timing, count 1W data time consumption
            long bTime = System. currentTimeMillis();
            //start loop
            while (begin < end) {
                //assignment
                pstm.setLong(1, begin);
                pstm.setString(2, RandomValue.getChineseName());
                pstm.setString(3, RandomValue.name_sex);
                pstm.setInt(4, RandomValue.getNum(1, 100));
                pstm.setString(5, RandomValue.getEmail(4, 15));
                pstm.setString(6, RandomValue.getTel());
                pstm.setString(7, RandomValue.getRoad());
                //Add to the same batch
                pstm.addBatch();
                begin++;
            }
            //Execute batch processing
            pstm. executeBatch();
           //commit transaction
  // conn.commit();
            //Boundary value increases by 10W
            end += 100000;
            //Turn off split timer
            long eTime = System. currentTimeMillis();
            // output
            System.out.println("Time-consuming to successfully insert 10W pieces of data:" + (eTime-bTime));
        }
        // close total
        long eTime1 = System. currentTimeMillis();
        // output
        System.out.println("Total time-consuming to insert 100W data:" + (eTime1-bTime1));
    } catch (SQLException e) {
        e.printStackTrace();
    } catch (ClassNotFoundException e1) {
        e1.printStackTrace();
    }
}

Start testing first

No transaction, insert 10W pieces of data each cycle, loop 10 times, a total of 100W pieces of data.

The result is as follows:

Time-consuming to successfully insert 10W pieces of data: 3832
Time-consuming to successfully insert 10W pieces of data: 1770
Time-consuming to successfully insert 10W pieces of data: 2628
Time-consuming to successfully insert 10W pieces of data: 2140
Time-consuming to successfully insert 10W pieces of data: 2148
Time-consuming to successfully insert 10W pieces of data: 1757
Time-consuming to successfully insert 10W pieces of data: 1767
Time-consuming to successfully insert 10W pieces of data: 1832
Time-consuming to successfully insert 10W pieces of data: 1830
Time-consuming to successfully insert 10W pieces of data: 2031
Total time spent inserting 100W data: 21737

Experimental results:

Using JDBC batch processing, without opening the transaction, an average of 100,000 pieces of data are inserted every 2.1 seconds

Continue to test

Start the transaction, insert 10W pieces of data each cycle, and cycle 10 times, a total of 100W pieces of data.

The result is as follows:

Time-consuming to successfully insert 10W pieces of data: 3482
Time-consuming to successfully insert 10W pieces of data: 1776
Time-consuming to successfully insert 10W pieces of data: 1979
Time-consuming to successfully insert 10W pieces of data: 1730
Time-consuming to successfully insert 10W pieces of data: 1643
Time-consuming to successfully insert 10W pieces of data: 1665
Time-consuming to successfully insert 10W pieces of data: 1622
Time-consuming to successfully insert 10W pieces of data: 1624
Time-consuming to successfully insert 10W pieces of data: 1779
Time-consuming to successfully insert 10W pieces of data: 1698
Total time spent inserting 100W data: 19003

Experimental results:

Use JDBC batch processing, start a transaction, and insert 100,000 pieces of data every 1.9 seconds on average

3 Summary

It can be seen that both JDBC direct processing and JDBC batch processing take less time when the transaction is turned on.

  • Mybatis lightweight framework is inserted, mybatis was hacked miserably in my experiment, haha. After the actual start of the transaction, the gap will not be so large (the gap is 10 times). Anyone who is interested can continue to test
  • JDBC handles it directly. In this experiment, the time-consuming difference between opening and closing a transaction is about 5 times, and this multiple will increase with the increase of the amount of data. Because when the transaction is not opened, if you update 10,000 pieces of data, you have to access the database 10,000 times. As a result, each operation needs to operate the database once.
  • JDBC batch processing, in this experiment, the time-consuming difference between opening and closing a transaction is very small (we will increase the test later to increase the gap in this value). But it can be seen that after the transaction is started, the speed is still improved.

Conclusion: When designing the insertion of a large number of single data, using JDBC batch processing and transaction mixing is the fastest

The measured time to insert 100 million pieces of data using batch processing + transaction mix: 174756 milliseconds

4 supplement

JDBC batch processing transactions, opening and closing transactions, inserting 20 times for evaluation, 50W data at a time, a total of 10 million data time-consuming:

1. Open the transaction (the data is too long to post)

Total time spent inserting 1000W data: 197654

2. Close the transaction (the data is too long to post)

Total time spent inserting 1000W data: 200540

Still not much difference~

Borrow:

They are:

  • No batch processing, no transaction;
  • Only use batch processing, not transactions;
  • Only use transactions, no batch processing;
  • Use both transactions and batch processing; (obviously, this is the fastest, so it is recommended to use batch processing and transactions at the same time when processing large amounts of data)

Source: https://zhuanlan.zhihu.com/p/517588313