Efficient solution: It only takes 13 seconds to insert 300,000 pieces of data into MySQL

Click to follow the official account, Java dry goodsdelivered in time

This article mainly describes the cases and results of large data volume data insertion through MyBatis, JDBC, etc.

300,000 pieces of data are inserted into the database for verification

  • Entity class, mapper and configuration file definition

    • User entity

    • mapper interface

    • mapper.xml file

    • jdbc.properties

    • sqlMapConfig.xml

  • Stud directly regardless of batches

  • Insert one by one in a loop

  • MyBatis inserts 300,000 pieces of data

  • JDBC realizes inserting 300,000 pieces of data

  • Summarize

The verified database table structure is as follows:

CREATE TABLE `t_user` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'user id',
  `username` varchar(64) DEFAULT NULL COMMENT 'username',
  `age` int(4) DEFAULT NULL COMMENT 'age',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='User Information Table';

Without further ado, let’s start!

Entity class, mapper and configuration file definition

User entity

/**
 * <p>user entity</p>
 *
 * @Author zjq
 */
@Data
public class User {

    private int id;
    private String username;
    private int age;

}

mapper interface

public interface UserMapper {

    /**
     * Batch insert users
     * @param userList
     */
    void batchInsertUser(@Param("list") List<User> userList);


}

mapper.xml file

<!-- Batch insert user information -->
<insert id="batchInsertUser" parameterType="java.util.List">
    insert into t_user(username, age) values
    <foreach collection="list" item="item" index="index" separator=",">
        (
        #{item. username},
        #{item. age}
        )
    </foreach>
</insert>

jdbc.properties

jdbc.driver=com.mysql.jdbc.Driver
jdbc.url=jdbc:mysql://localhost:3306/test
jdbc.username=root
jdbc.password=root

sqlMapConfig.xml

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE configuration PUBLIC "-//mybatis.org//DTD Config 3.0//EN" "http://mybatis.org/dtd/mybatis-3-config.dtd">
<configuration>

    <!--Load external properties file through properties tag-->
    <properties resource="jdbc.properties"></properties>


    <!--Custom Alias-->
    <typeAliases>
        <typeAlias type="com.zjq.domain.User" alias="user"></typeAlias>
    </typeAliases>


    <!--Data source environment-->
    <environments default="development">
        <environment id="development">
            <transactionManager type="JDBC"></transactionManager>
            <dataSource type="POOLED">
                <property name="driver" value="${jdbc.driver}"/>
                <property name="url" value="${jdbc.url}"/>
                <property name="username" value="${jdbc.username}"/>
                <property name="password" value="${jdbc.password}"/>
            </dataSource>
        </environment>
    </environments>


    <!--Load mapping file-->
    <mappers>
        <mapper resource="com/zjq/mapper/UserMapper.xml"></mapper>
    </mappers>


</configuration>

Direct stud without batch

MyBatis directly inserts 300,000 records in batches at one time. The code is as follows:

@Test
public void testBatchInsertUser() throws IOException {
    InputStream resourceAsStream =
            Resources.getResourceAsStream("sqlMapConfig.xml");
    SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(resourceAsStream);
    SqlSession session = sqlSessionFactory. openSession();
    System.out.println("===== start inserting data =====");
    long startTime = System. currentTimeMillis();
    try {
        List<User> userList = new ArrayList<>();
        for (int i = 1; i <= 300000; i ++ ) {
            User user = new User();
            user.setId(i);
            user.setUsername("Share a drink without " + i);
            user.setAge((int) (Math.random() * 100));
            userList. add(user);
        }
        session.insert("batchInsertUser", userList); // Finally insert the remaining data
        session.commit();

        long spendTime = System. currentTimeMillis()-startTime;
        System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds");
    } finally {
        session. close();
    }
}

You can see the console output:

Cause: com.mysql.jdbc.PacketTooBigException: Packet for query is too large (27759038 >yun 4194304). You can change this value on the server by setting the max_allowed_packet’ variable.

7a2bdb747523ac63561e02a712e4856f.png

The limit of the maximum data packet is exceeded. You can adjust the limit of max_allowed_packet to increase the content that can be transmitted. However, because 300,000 pieces of data exceed too much, this is not advisable. It seems that Stud will not work

Since Stud doesn’t work, let’s insert them one by one in a loop, okay?

Cycle insert one by one

The newly added content of a single user in the mapper interface and mapper file is as follows:

/**
 * Add a single user
 * @param user
 */
void insertUser(User user);
<!-- Add user information -->
<insert id="insertUser" parameterType="user">
    insert into t_user(username, age) values
        (
        #{username},
        #{age}
        )
</insert>

Adjust the execution code as follows:

@Test
public void testCirculateInsertUser() throws IOException {
    InputStream resourceAsStream =
            Resources.getResourceAsStream("sqlMapConfig.xml");
    SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(resourceAsStream);
    SqlSession session = sqlSessionFactory. openSession();
    System.out.println("===== start inserting data =====");
    long startTime = System. currentTimeMillis();
    try {
        for (int i = 1; i <= 300000; i ++ ) {
            User user = new User();
            user.setId(i);
            user.setUsername("Share a drink without " + i);
            user.setAge((int) (Math.random() * 100));
            // add one by one
            session.insert("insertUser", user);
            session.commit();
        }

        long spendTime = System. currentTimeMillis()-startTime;
        System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds");
    } finally {
        session. close();
    }
}

After execution, it can be found that the proportion of disk IO has soared and has been at a high level.

991ffebcf046428b55d46f37fc811682.png

Wait, wait, wait, it’s been a long time

acb6ced38b201e2372ea2ff4bf0e701a.gif

Let’s ignore him, it’s too slow, let’s do other things first, and let’s see the result later.

two thousand years later…

The console output is as follows:

ace678ffd468880c29a44fd1d42b421b.png

A total of 14909367 milliseconds were executed, which is converted to 4 hours and 8 minutes. too slow. .

3eaea60fb2e987b65d44383580934104.png

Let’s optimize the previous batch processing scheme

MyBatis inserts 300,000 pieces of data

First clean up the table data, and then optimize the batch execution insert:

-- Clear user table
TRUNCATE table t_user;

The following is the implementation of 300,000 data insertion codes through MyBatis:

/**
 * Batch insertion in batches
 * @throws IOException
 */
@Test
public void testBatchInsertUser() throws IOException {
    InputStream resourceAsStream =
            Resources.getResourceAsStream("sqlMapConfig.xml");
    SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(resourceAsStream);
    SqlSession session = sqlSessionFactory. openSession();
    System.out.println("===== start inserting data =====");
    long startTime = System. currentTimeMillis();
    int waitTime = 10;
    try {
        List<User> userList = new ArrayList<>();
        for (int i = 1; i <= 300000; i ++ ) {
            User user = new User();
            user.setId(i);
            user.setUsername("Share a drink without " + i);
            user.setAge((int) (Math.random() * 100));
            userList. add(user);
            if (i % 1000 == 0) {
                session.insert("batchInsertUser", userList);
                // Submit a transaction every 1000 pieces of data
                session.commit();
                userList. clear();

                // wait for a while
                Thread. sleep(waitTime * 1000);
            }
        }
        // finally insert the remaining data
        if(!CollectionUtils. isEmpty(userList)) {
            session.insert("batchInsertUser", userList);
            session.commit();
        }

        long spendTime = System. currentTimeMillis()-startTime;
        System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds");
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        session. close();
    }
}

Using the batch processing operation of MyBatis, inserting every 1000 pieces of data in a batch can effectively improve the insertion speed. At the same time, please pay attention to the appropriate waiting time and batch size when inserting loops to prevent problems such as excessive memory usage. In addition, it is also necessary to set reasonable connection pool and database parameters in the configuration file to obtain better performance.

4914d5b8e0574cb4a2481918f018ebff.png

In the example above, we do a batch commit every 1000 rows inserted and wait 10 seconds. This helps control memory usage and ensures that insert operations run smoothly.

b2912660836610243106734858d266ab.png

The execution was completed in fifty minutes, and the time was mainly spent waiting.

If it is executed during the low period, and the CPU and disk performance are sufficient, the direct batch processing does not wait for execution:

/**
 * Batch insertion in batches
 * @throws IOException
 */
@Test
public void testBatchInsertUser() throws IOException {
    InputStream resourceAsStream =
            Resources.getResourceAsStream("sqlMapConfig.xml");
    SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(resourceAsStream);
    SqlSession session = sqlSessionFactory. openSession();
    System.out.println("===== start inserting data =====");
    long startTime = System. currentTimeMillis();
    int waitTime = 10;
    try {
        List<User> userList = new ArrayList<>();
        for (int i = 1; i <= 300000; i ++ ) {
            User user = new User();
            user.setId(i);
            user.setUsername("Share a drink without " + i);
            user.setAge((int) (Math.random() * 100));
            userList. add(user);
            if (i % 1000 == 0) {
                session.insert("batchInsertUser", userList);
                // Submit a transaction every 1000 pieces of data
                session.commit();
                userList. clear();
            }
        }
        // finally insert the remaining data
        if(!CollectionUtils. isEmpty(userList)) {
            session.insert("batchInsertUser", userList);
            session.commit();
        }

        long spendTime = System. currentTimeMillis()-startTime;
        System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds");
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        session. close();
    }
}

Then the data insertion operation can be completed in 24 seconds:

82af372946eb5691ceeee01df2dcd880.png
205909243d1365ab33935176ba04f805.png

It can be seen that the short-term CPU and disk usage will soar.

Increase the amount of batch processing to 5000, and execute:

68f9e600bb818fc3c2c13c88446b3351.png

300,000 records were successfully inserted in 13 seconds, and took off directly from Wuhu

JDBC realizes inserting 300,000 pieces of data

The JDBC loop insertion is similar to the above mybatis insertion one by one, so I won’t repeat it here.

The following is a sample code of Java using JDBC batch processing to insert 300,000 records. Please note that this code only provides an idea, and the specific implementation needs to be modified according to the actual situation.

/**
 * JDBC batch insert in batches
 * @throws IOException
 */
@Test
public void testJDBCBatchInsertUser() throws IOException {
    Connection connection = null;
    PreparedStatement preparedStatement = null;

    String databaseURL = "jdbc:mysql://localhost:3306/test";
    String user = "root";
    String password = "root";

    try {
        connection = DriverManager.getConnection(databaseURL, user, password);
        // Turn off auto-commit transactions and manually submit them instead
        connection.setAutoCommit(false);
        System.out.println("===== start inserting data =====");
        long startTime = System. currentTimeMillis();
        String sqlInsert = "INSERT INTO t_user ( username, age) VALUES ( ?, ?)";
        preparedStatement = connection. prepareStatement(sqlInsert);

        Random random = new Random();
        for (int i = 1; i <= 300000; i ++ ) {
            preparedStatement.setString(1, "Total drink without " + i);
            preparedStatement.setInt(2, random.nextInt(100));
            // add to batch
            preparedStatement. addBatch();

            if (i % 1000 == 0) {
                // Submit every 1000 pieces of data
                preparedStatement. executeBatch();
                connection.commit();
                System.out.println("Successfully inserted the first " + i + " data");
            }

        }
        // process the rest of the data
        preparedStatement. executeBatch();
        connection.commit();
        long spendTime = System. currentTimeMillis()-startTime;
        System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds");
    } catch (SQLException e) {
        System.out.println("Error: " + e.getMessage());
    } finally {
        if (preparedStatement != null) {
            try {
                preparedStatement. close();
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }

        if (connection != null) {
            try {
                connection. close();
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
    }
}

fa1c807cabfdd7262778c4d6c8a46f8c.gif
4bf3b1d3c229d32b0076c15450be1f34.png

In the above sample code, we connect to the MySQL database through JDBC and perform batch operations to insert data. The specific implementation steps are as follows:

  • Get a database connection.

  • Create a Statement object.

  • Define SQL statement, use PreparedStatement object to precompile SQL statement and set parameters.

  • Perform batch operations.

  • Process the remaining data.

  • Closes the Statement and Connection objects.

Use setAutoCommit(false) to disable automatic transaction commit, and then manually commit the transaction after each batch insert. A new PreparedStatement object is created every time data is inserted to avoid state inconsistencies. In the loop of inserting data, execute executeBatch() to insert data every 10000 pieces of data.

In addition, you need to optimize the configuration of the connection pool and database according to the actual situation to prevent problems such as connection timeout.

Summary

To achieve efficient large-scale data insertion requires the combination of the following optimization strategies (combined use is recommended):

1. Batch processing: Submitting SQL statements in batches can reduce network transmission and processing overhead, and reduce the number of interactions with the database. In Java, you can use the addBatch() method of Statement or PreparedStatement to add multiple SQL statements, and then execute the executeBatch() method once to submit Batch SQL statements.

  • With appropriate waiting time and batch size during loop insertion, so as to avoid problems such as high memory usage:

    • Set an appropriate batch size: The batch size refers to how many rows of data are inserted in one insert operation. If the batch size is too small, the frequency of insert operations will be high, while if the batch size is too large, memory usage may be high. Generally, it is recommended to set the batch size to 1000-5000 rows, which will reduce the frequency of insert operations and reduce memory footprint.

    • Use appropriate wait times: Wait times refer to the amount of time to wait between batch operations. Waiting too short can lead to high memory usage, while waiting too long can slow down insert operations. Generally, it is recommended to set the waiting time between a few seconds and tens of seconds, which will make the operation smooth and avoid problems such as high memory usage.

    • You can consider using some memory optimization techniques, such as using an in-memory database or using a cursor to insert data, to reduce memory usage.

  • In general, choosing an appropriate batch size and wait time can help you perform insert operations smoothly and avoid problems such as high memory usage.

2. Index: Temporarily remove the index before inserting a large amount of data, and then add it at the end, which can greatly reduce the time to update the index when writing.

3. Database connection pool: Using a database connection pool can reduce the overhead of establishing and closing a database connection and improve performance. In the case of not using the database connection pool, remember to close the relevant connection in finally.

Database parameter adjustment: increase the MySQL database buffer size, configure high-performance disk and I/O, etc.

Source: blog.csdn.net/qq_35427589/article/

details/129665307


Popular content:

  • Spring Boot + rule engine URule, too strong!

  • Are you still writing join table queries by hand? MyBatis-Plus is so delicious!

  • Spring Boot + flowable quickly implement workflow


8c0da872ab29f28b465894e60f269c23.jpeg


I recently interviewed BAT, and compiled an interview material "Java Interview BAT Clearance Manual", covering Java core technology, JVM, Java concurrency, SSM, microservices, databases, data structures, etc.
How to get it: Click "Watching", follow the official account and reply to 666 to get it, and more content will be provided one after another. 

See you tomorrow (ω)