How to insert hundreds of thousands of data in Java (it only takes 13 seconds to insert 300,000 data into MySQL)

This article mainly describes the cases and results of large data volume data insertion through MyBatis, JDBC, etc.

300,000 pieces of data are inserted into the database for verification

    • Entity class, mapper and configuration file definition
      • User entity
      • mapper interface
      • mapper.xml file
      • jdbc.properties
      • sqlMapConfig.xml
    • Stud directly regardless of batches
    • Insert one by one in a loop
    • MyBatis inserts 300,000 pieces of data
    • JDBC realizes inserting 300,000 pieces of data
    • Summarize

The verified database table structure is as follows:

CREATE TABLE `t_user` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'user id',
  `username` varchar(64) DEFAULT NULL COMMENT 'username',
  `age` int(4) DEFAULT NULL COMMENT 'age',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='User information table';

Without further ado, let’s start!

Entity class, mapper and configuration file definition

User entity

/**
 * <p>user entity</p>
 *
 * @Author zjq
 * @Date 2021/8/3
 */
@Data
public class User {<!-- -->

    private int id;
    private String username;
    private int age;

}

mapper interface

public interface UserMapper {<!-- -->

    /**
     * Batch insert users
     * @param userList
     */
    void batchInsertUser(@Param("list") List<User> userList);


}

mapper.xml file

 <!-- Batch insert user information -->
    <insert id="batchInsertUser" parameterType="java.util.List">
        insert into t_user(username, age) values
        <foreach collection="list" item="item" index="index" separator=",">
            (
            #{item. username},
            #{item. age}
            )
        </foreach>
    </insert>

jdbc.properties

jdbc.driver=com.mysql.jdbc.Driver
jdbc.url=jdbc:mysql://localhost:3306/test
jdbc.username=root
jdbc.password=root

sqlMapConfig.xml

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE configuration PUBLIC "-//mybatis.org//DTD Config 3.0//EN" "http://mybatis.org/dtd/mybatis-3-config.dtd">
<configuration>

    <!--Load external properties file through properties tag-->
    <properties resource="jdbc.properties"></properties>


    <!--Custom Alias-->
    <typeAliases>
        <typeAlias type="com.zjq.domain.User" alias="user"></typeAlias>
    </typeAliases>


    <!--Data source environment-->
    <environments default="development">
        <environment id="development">
            <transactionManager type="JDBC"></transactionManager>
            <dataSource type="POOLED">
                <property name="driver" value="${jdbc.driver}"/>
                <property name="url" value="${jdbc.url}"/>
                <property name="username" value="${jdbc.username}"/>
                <property name="password" value="${jdbc.password}"/>
            </dataSource>
        </environment>
    </environments>


    <!--Load mapping file-->
    <mappers>
        <mapper resource="com/zjq/mapper/UserMapper.xml"></mapper>
    </mappers>


</configuration>

Direct stud without batch

MyBatis directly inserts 300,000 records in batches at one time. The code is as follows:

 @Test
    public void testBatchInsertUser() throws IOException {<!-- -->
        InputStream resourceAsStream =
                Resources.getResourceAsStream("sqlMapConfig.xml");
        SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(resourceAsStream);
        SqlSession session = sqlSessionFactory. openSession();
        System.out.println("===== start inserting data =====");
        long startTime = System. currentTimeMillis();
        try {<!-- -->
            List<User> userList = new ArrayList<>();
            for (int i = 1; i <= 300000; i ++ ) {<!-- -->
                User user = new User();
                user.setId(i);
                user.setUsername("Share a drink without " + i);
                user.setAge((int) (Math.random() * 100));
                userList. add(user);
            }
            session.insert("batchInsertUser", userList); // Finally insert the remaining data
            session.commit();

            long spendTime = System. currentTimeMillis()-startTime;
            System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds");
        } finally {<!-- -->
            session. close();
        }
    }

You can see the console output:

Cause: com.mysql.jdbc.PacketTooBigException: Packet for query is too large (27759038 >yun 4194304). You can change this value on the server by setting the max_allowed_packet’ variable.

Cause: com.mysql.jdbc.PacketTooBigException: Packet for query is too large (27759038 > yun 4194304). You can change this value on the server by setting the max_allowed_packet’ variable”></p>
<p>The limit of the maximum data packet is exceeded. You can adjust the limit of <code>max_allowed_packet</code> to increase the content that can be transmitted. However, because 300,000 pieces of data exceed too much, this is not advisable. It seems that Stud will not work <br /> Since Stud doesn’t work, let’s insert them one by one in a loop, okay?</p>
<p><center></center></p>
<h2>Cycle insert one by one</h2>
<p>The newly added content of a single user in the mapper interface and mapper file is as follows:</p>
<pre> /**
     * Add a single user
     * @param user
     */
    void insertUser(User user);
</pre>
<pre> <!-- Add user information -->
    <insert id=

Adjust the execution code as follows:

 @Test
    public void testCirculateInsertUser() throws IOException {<!-- -->
        InputStream resourceAsStream =
                Resources.getResourceAsStream("sqlMapConfig.xml");
        SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(resourceAsStream);
        SqlSession session = sqlSessionFactory. openSession();
        System.out.println("===== start inserting data =====");
        long startTime = System. currentTimeMillis();
        try {<!-- -->
            for (int i = 1; i <= 300000; i ++ ) {<!-- -->
                User user = new User();
                user.setId(i);
                user.setUsername("Share a drink without " + i);
                user.setAge((int) (Math.random() * 100));
                // add one by one
                session.insert("insertUser", user);
                session.commit();
            }

            long spendTime = System. currentTimeMillis()-startTime;
            System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds");
        } finally {<!-- -->
            session. close();
        }
    }

After execution, it can be found that the proportion of disk IO has soared and has been at a high level.

Wait, wait, wait, it’s been a long time

ccc.gif
Let’s ignore him, it’s too slow, let’s do other things first, and let’s see the result later.
two thousand years later…
The console output is as follows:

A total of 14909367 milliseconds were executed, which is converted to 4 hours and 8 minutes. too slow. .
image.png
Let’s optimize the previous batch processing scheme

MyBatis inserts 300,000 pieces of data

First clean up the table data, and then optimize the batch execution insert:

-- Clear user table
TRUNCATE table t_user;

The following is the implementation of 300,000 data insertion codes through MyBatis:

 /**
     * Batch insertion in batches
     * @throws IOException
     */
    @Test
    public void testBatchInsertUser() throws IOException {<!-- -->
        InputStream resourceAsStream =
                Resources.getResourceAsStream("sqlMapConfig.xml");
        SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(resourceAsStream);
        SqlSession session = sqlSessionFactory. openSession();
        System.out.println("===== start inserting data =====");
        long startTime = System. currentTimeMillis();
        int waitTime = 10;
        try {<!-- -->
            List<User> userList = new ArrayList<>();
            for (int i = 1; i <= 300000; i ++ ) {<!-- -->
                User user = new User();
                user.setId(i);
                user.setUsername("Share a drink without " + i);
                user.setAge((int) (Math.random() * 100));
                userList. add(user);
                if (i % 1000 == 0) {<!-- -->
                    session.insert("batchInsertUser", userList);
                    // Submit a transaction every 1000 pieces of data
                    session.commit();
                    userList. clear();

                    // wait for a while
                    Thread. sleep(waitTime * 1000);
                }
            }
            // finally insert the remaining data
            if(!CollectionUtils.isEmpty(userList)) {<!-- -->
                session.insert("batchInsertUser", userList);
                session.commit();
            }

            long spendTime = System. currentTimeMillis()-startTime;
            System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds");
        } catch (Exception e) {<!-- -->
            e.printStackTrace();
        } finally {<!-- -->
            session. close();
        }
    }

Using the batch processing operation of MyBatis, inserting every 1000 pieces of data in a batch can effectively improve the insertion speed. At the same time, please pay attention to the appropriate waiting time and batch size when inserting loops to prevent problems such as excessive memory usage. In addition, it is also necessary to set reasonable connection pool and database parameters in the configuration file to obtain better performance.
CPU and disk usage
In the example above, we do a batch commit every 1000 rows inserted and wait 10 seconds. This helps control memory usage and ensures that insert operations run smoothly.
image.png
The execution was completed in fifty minutes, and the time was mainly spent waiting.

If it is executed during the low period, and the CPU and disk performance are sufficient, the direct batch processing does not wait for execution:

 /**
     * Batch insertion in batches
     * @throws IOException
     */
    @Test
    public void testBatchInsertUser() throws IOException {<!-- -->
        InputStream resourceAsStream =
                Resources.getResourceAsStream("sqlMapConfig.xml");
        SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(resourceAsStream);
        SqlSession session = sqlSessionFactory. openSession();
        System.out.println("===== start inserting data =====");
        long startTime = System. currentTimeMillis();
        int waitTime = 10;
        try {<!-- -->
            List<User> userList = new ArrayList<>();
            for (int i = 1; i <= 300000; i ++ ) {<!-- -->
                User user = new User();
                user.setId(i);
                user.setUsername("Share a drink without " + i);
                user.setAge((int) (Math.random() * 100));
                userList. add(user);
                if (i % 1000 == 0) {<!-- -->
                    session.insert("batchInsertUser", userList);
                    // Submit a transaction every 1000 pieces of data
                    session.commit();
                    userList. clear();
                }
            }
            // finally insert the remaining data
            if(!CollectionUtils.isEmpty(userList)) {<!-- -->
                session.insert("batchInsertUser", userList);
                session.commit();
            }

            long spendTime = System. currentTimeMillis()-startTime;
            System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds");
        } catch (Exception e) {<!-- -->
            e.printStackTrace();
        } finally {<!-- -->
            session. close();
        }
    }

Then the data insertion operation can be completed in 24 seconds:

image.png
It can be seen that the short-term CPU and disk usage will soar.
Increase the amount of batch processing to 5000, and execute:

300,000 records were successfully inserted in 13 seconds, and took off directly from Wuhu

JDBC realizes inserting 300,000 pieces of data

The JDBC loop insertion is similar to the above mybatis insertion one by one, so I won’t repeat it here.
The following is a sample code of Java using JDBC batch processing to insert 300,000 records. Please note that this code only provides an idea, and the specific implementation needs to be modified according to the actual situation.

 /**
     * JDBC batch insert in batches
     * @throws IOException
     */
    @Test
    public void testJDBCBatchInsertUser() throws IOException {<!-- -->
        Connection connection = null;
        PreparedStatement preparedStatement = null;

        String databaseURL = "jdbc:mysql://localhost:3306/test";
        String user = "root";
        String password = "root";

        try {<!-- -->
            connection = DriverManager.getConnection(databaseURL, user, password);
            // Turn off auto-commit transactions and manually submit them instead
            connection.setAutoCommit(false);
            System.out.println("===== start inserting data =====");
            long startTime = System. currentTimeMillis();
            String sqlInsert = "INSERT INTO t_user ( username, age) VALUES ( ?, ?)";
            preparedStatement = connection. prepareStatement(sqlInsert);

            Random random = new Random();
            for (int i = 1; i <= 300000; i ++ ) {<!-- -->
                preparedStatement.setString(1, "Total drink without " + i);
                preparedStatement.setInt(2, random.nextInt(100));
                // add to batch
                preparedStatement. addBatch();

                if (i % 1000 == 0) {<!-- -->
                    // Submit every 1000 pieces of data
                    preparedStatement. executeBatch();
                    connection.commit();
                    System.out.println("Successfully inserted the first " + i + " data");
                }

            }
            // process the rest of the data
            preparedStatement. executeBatch();
            connection.commit();
            long spendTime = System. currentTimeMillis()-startTime;
            System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds");
        } catch (SQLException e) {<!-- -->
            System.out.println("Error: " + e.getMessage());
        } finally {<!-- -->
            if (preparedStatement != null) {<!-- -->
                try {<!-- -->
                    preparedStatement. close();
                } catch (SQLException e) {<!-- -->
                    e.printStackTrace();
                }
            }

            if (connection != null) {<!-- -->
                try {<!-- -->
                    connection. close();
                } catch (SQLException e) {<!-- -->
                    e.printStackTrace();
                }
            }
        }
    }

ccc.gif

In the above sample code, we connect to the MySQL database through JDBC and perform batch operations to insert data. The specific implementation steps are as follows:

  1. Get a database connection.
  2. Create a Statement object.
  3. Define the SQL statement, use the PreparedStatement object to precompile the SQL statement and set parameters.
  4. Perform batch operations.
  5. Process the remaining data.
  6. Closes the Statement and Connection objects.

Use setAutoCommit(false) to disable automatic transaction commit, and then manually commit the transaction after each batch insert. A PreparedStatement object is created every time data is inserted to avoid state inconsistencies. In the loop of inserting data, execute executeBatch() to insert data every 10000 pieces of data.
In addition, you need to optimize the configuration of the connection pool and database according to the actual situation to prevent problems such as connection timeout.

Summary

To achieve efficient large-scale data insertion requires the combination of the following optimization strategies (combined use is recommended):

  1. Batch processing: Submitting SQL statements in batches can reduce network transmission and processing overhead, and reduce the number of interactions with the database. In Java, you can use the addBatch() method of Statement or PreparedStatement to add multiple SQL statements, and then execute the executeBatch() method once to submit Batch SQL statements.
  • With proper waiting time and batch size during loop insertion, so as to avoid problems such as high memory usage:
    • Set an appropriate batch size: The batch size refers to how many rows of data are inserted in one insert operation. If the batch size is too small, the frequency of insert operations will be high, while if the batch size is too large, memory usage may be high. Generally, it is recommended to set the batch size to 1000-5000 rows, which will reduce the frequency of insert operations and reduce memory footprint.
    • Use appropriate wait times: Wait times refer to the amount of time to wait between batch operations. Waiting too short can lead to high memory usage, while waiting too long can slow down insert operations. Generally, it is recommended to set the waiting time between a few seconds and tens of seconds, which will make the operation smooth and avoid problems such as high memory usage.
    • You can consider using some memory optimization techniques, such as using an in-memory database or using a cursor to insert data, to reduce memory usage.
  • In general, choosing an appropriate batch size and wait time can help you perform insert operations smoothly and avoid problems such as high memory usage.
  1. Index: Temporarily remove the index before inserting a large amount of data, and then add it at the end, which can greatly reduce the time to update the index when writing.
  2. Database connection pool: Using database connection pool can reduce the overhead of establishing and closing database connections and improve performance. In the case of not using the database connection pool, remember to close the relevant connection in finally.
  3. Database parameter adjustment: increase the MySQL database buffer size, configure high-performance disk and I/O, etc.

The content of this article is over,
If you gain something, please likeFavoriteFollow, your encouragement is my biggest motivation.
If there is any error? Question You are welcome to point out.
Homepage: a summary of blogs with a drink?

Keep loving and go to the next mountain and sea.