Click to follow the official account, Java dry goodsdelivered in time
This article mainly describes the cases and results of large data volume data insertion through MyBatis, JDBC, etc.
300,000 pieces of data are inserted into the database for verification
-
Entity class, mapper and configuration file definition
-
User entity
-
mapper interface
-
mapper.xml file
-
jdbc.properties
-
sqlMapConfig.xml
-
-
Stud directly regardless of batches
-
Insert one by one in a loop
-
MyBatis inserts 300,000 pieces of data
-
JDBC realizes inserting 300,000 pieces of data
-
Summarize
The verified database table structure is as follows:
CREATE TABLE `t_user` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'user id', `username` varchar(64) DEFAULT NULL COMMENT 'username', `age` int(4) DEFAULT NULL COMMENT 'age', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='User Information Table';
Without further ado, let’s start!
Entity class, mapper and configuration file definition
User entity
/** * <p>user entity</p> * * @Author zjq */ @Data public class User { private int id; private String username; private int age; }
mapper interface
public interface UserMapper { /** * Batch insert users * @param userList */ void batchInsertUser(@Param("list") List<User> userList); }
mapper.xml file
<!-- Batch insert user information --> <insert id="batchInsertUser" parameterType="java.util.List"> insert into t_user(username, age) values <foreach collection="list" item="item" index="index" separator=","> ( #{item. username}, #{item. age} ) </foreach> </insert>
jdbc.properties
jdbc.driver=com.mysql.jdbc.Driver jdbc.url=jdbc:mysql://localhost:3306/test jdbc.username=root jdbc.password=root
sqlMapConfig.xml
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE configuration PUBLIC "-//mybatis.org//DTD Config 3.0//EN" "http://mybatis.org/dtd/mybatis-3-config.dtd"> <configuration> <!--Load external properties file through properties tag--> <properties resource="jdbc.properties"></properties> <!--Custom Alias--> <typeAliases> <typeAlias type="com.zjq.domain.User" alias="user"></typeAlias> </typeAliases> <!--Data source environment--> <environments default="development"> <environment id="development"> <transactionManager type="JDBC"></transactionManager> <dataSource type="POOLED"> <property name="driver" value="${jdbc.driver}"/> <property name="url" value="${jdbc.url}"/> <property name="username" value="${jdbc.username}"/> <property name="password" value="${jdbc.password}"/> </dataSource> </environment> </environments> <!--Load mapping file--> <mappers> <mapper resource="com/zjq/mapper/UserMapper.xml"></mapper> </mappers> </configuration>
Direct stud without batch
MyBatis directly inserts 300,000 records in batches at one time. The code is as follows:
@Test public void testBatchInsertUser() throws IOException { InputStream resourceAsStream = Resources.getResourceAsStream("sqlMapConfig.xml"); SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(resourceAsStream); SqlSession session = sqlSessionFactory. openSession(); System.out.println("===== start inserting data ====="); long startTime = System. currentTimeMillis(); try { List<User> userList = new ArrayList<>(); for (int i = 1; i <= 300000; i ++ ) { User user = new User(); user.setId(i); user.setUsername("Share a drink without " + i); user.setAge((int) (Math.random() * 100)); userList. add(user); } session.insert("batchInsertUser", userList); // Finally insert the remaining data session.commit(); long spendTime = System. currentTimeMillis()-startTime; System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds"); } finally { session. close(); } }
You can see the console output:
Cause: com.mysql.jdbc.PacketTooBigException: Packet for query is too large (27759038 >yun 4194304). You can change this value on the server by setting the max_allowed_packet’ variable.
The limit of the maximum data packet is exceeded. You can adjust the limit of max_allowed_packet
to increase the content that can be transmitted. However, because 300,000 pieces of data exceed too much, this is not advisable. It seems that Stud will not work
Since Stud doesn’t work, let’s insert them one by one in a loop, okay?
Cycle insert one by one
The newly added content of a single user in the mapper interface and mapper file is as follows:
/** * Add a single user * @param user */ void insertUser(User user);
<!-- Add user information --> <insert id="insertUser" parameterType="user"> insert into t_user(username, age) values ( #{username}, #{age} ) </insert>
Adjust the execution code as follows:
@Test public void testCirculateInsertUser() throws IOException { InputStream resourceAsStream = Resources.getResourceAsStream("sqlMapConfig.xml"); SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(resourceAsStream); SqlSession session = sqlSessionFactory. openSession(); System.out.println("===== start inserting data ====="); long startTime = System. currentTimeMillis(); try { for (int i = 1; i <= 300000; i ++ ) { User user = new User(); user.setId(i); user.setUsername("Share a drink without " + i); user.setAge((int) (Math.random() * 100)); // add one by one session.insert("insertUser", user); session.commit(); } long spendTime = System. currentTimeMillis()-startTime; System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds"); } finally { session. close(); } }
After execution, it can be found that the proportion of disk IO has soared and has been at a high level.
Wait, wait, wait, it’s been a long time
Let’s ignore him, it’s too slow, let’s do other things first, and let’s see the result later.
two thousand years later…
The console output is as follows:
A total of 14909367 milliseconds were executed, which is converted to 4 hours and 8 minutes. too slow. .
Let’s optimize the previous batch processing scheme
MyBatis inserts 300,000 pieces of data
First clean up the table data, and then optimize the batch execution insert:
-- Clear user table TRUNCATE table t_user;
The following is the implementation of 300,000 data insertion codes through MyBatis:
/** * Batch insertion in batches * @throws IOException */ @Test public void testBatchInsertUser() throws IOException { InputStream resourceAsStream = Resources.getResourceAsStream("sqlMapConfig.xml"); SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(resourceAsStream); SqlSession session = sqlSessionFactory. openSession(); System.out.println("===== start inserting data ====="); long startTime = System. currentTimeMillis(); int waitTime = 10; try { List<User> userList = new ArrayList<>(); for (int i = 1; i <= 300000; i ++ ) { User user = new User(); user.setId(i); user.setUsername("Share a drink without " + i); user.setAge((int) (Math.random() * 100)); userList. add(user); if (i % 1000 == 0) { session.insert("batchInsertUser", userList); // Submit a transaction every 1000 pieces of data session.commit(); userList. clear(); // wait for a while Thread. sleep(waitTime * 1000); } } // finally insert the remaining data if(!CollectionUtils. isEmpty(userList)) { session.insert("batchInsertUser", userList); session.commit(); } long spendTime = System. currentTimeMillis()-startTime; System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds"); } catch (Exception e) { e.printStackTrace(); } finally { session. close(); } }
Using the batch processing operation of MyBatis, inserting every 1000 pieces of data in a batch can effectively improve the insertion speed. At the same time, please pay attention to the appropriate waiting time and batch size when inserting loops to prevent problems such as excessive memory usage. In addition, it is also necessary to set reasonable connection pool and database parameters in the configuration file to obtain better performance.
In the example above, we do a batch commit every 1000 rows inserted and wait 10 seconds. This helps control memory usage and ensures that insert operations run smoothly.
The execution was completed in fifty minutes, and the time was mainly spent waiting.
If it is executed during the low period, and the CPU and disk performance are sufficient, the direct batch processing does not wait for execution:
/** * Batch insertion in batches * @throws IOException */ @Test public void testBatchInsertUser() throws IOException { InputStream resourceAsStream = Resources.getResourceAsStream("sqlMapConfig.xml"); SqlSessionFactory sqlSessionFactory = new SqlSessionFactoryBuilder().build(resourceAsStream); SqlSession session = sqlSessionFactory. openSession(); System.out.println("===== start inserting data ====="); long startTime = System. currentTimeMillis(); int waitTime = 10; try { List<User> userList = new ArrayList<>(); for (int i = 1; i <= 300000; i ++ ) { User user = new User(); user.setId(i); user.setUsername("Share a drink without " + i); user.setAge((int) (Math.random() * 100)); userList. add(user); if (i % 1000 == 0) { session.insert("batchInsertUser", userList); // Submit a transaction every 1000 pieces of data session.commit(); userList. clear(); } } // finally insert the remaining data if(!CollectionUtils. isEmpty(userList)) { session.insert("batchInsertUser", userList); session.commit(); } long spendTime = System. currentTimeMillis()-startTime; System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds"); } catch (Exception e) { e.printStackTrace(); } finally { session. close(); } }
Then the data insertion operation can be completed in 24 seconds:
It can be seen that the short-term CPU and disk usage will soar.
Increase the amount of batch processing to 5000, and execute:
300,000 records were successfully inserted in 13 seconds, and took off directly from Wuhu
JDBC realizes inserting 300,000 pieces of data
The JDBC loop insertion is similar to the above mybatis insertion one by one, so I won’t repeat it here.
The following is a sample code of Java using JDBC batch processing to insert 300,000 records. Please note that this code only provides an idea, and the specific implementation needs to be modified according to the actual situation.
/** * JDBC batch insert in batches * @throws IOException */ @Test public void testJDBCBatchInsertUser() throws IOException { Connection connection = null; PreparedStatement preparedStatement = null; String databaseURL = "jdbc:mysql://localhost:3306/test"; String user = "root"; String password = "root"; try { connection = DriverManager.getConnection(databaseURL, user, password); // Turn off auto-commit transactions and manually submit them instead connection.setAutoCommit(false); System.out.println("===== start inserting data ====="); long startTime = System. currentTimeMillis(); String sqlInsert = "INSERT INTO t_user ( username, age) VALUES ( ?, ?)"; preparedStatement = connection. prepareStatement(sqlInsert); Random random = new Random(); for (int i = 1; i <= 300000; i ++ ) { preparedStatement.setString(1, "Total drink without " + i); preparedStatement.setInt(2, random.nextInt(100)); // add to batch preparedStatement. addBatch(); if (i % 1000 == 0) { // Submit every 1000 pieces of data preparedStatement. executeBatch(); connection.commit(); System.out.println("Successfully inserted the first " + i + " data"); } } // process the rest of the data preparedStatement. executeBatch(); connection.commit(); long spendTime = System. currentTimeMillis()-startTime; System.out.println("Successfully inserted 300,000 pieces of data, time-consuming: " + spendTime + "milliseconds"); } catch (SQLException e) { System.out.println("Error: " + e.getMessage()); } finally { if (preparedStatement != null) { try { preparedStatement. close(); } catch (SQLException e) { e.printStackTrace(); } } if (connection != null) { try { connection. close(); } catch (SQLException e) { e.printStackTrace(); } } } }
In the above sample code, we connect to the MySQL database through JDBC and perform batch operations to insert data. The specific implementation steps are as follows:
-
Get a database connection.
-
Create a Statement object.
-
Define SQL statement, use
PreparedStatement
object to precompile SQL statement and set parameters. -
Perform batch operations.
-
Process the remaining data.
-
Closes the Statement and Connection objects.
Use setAutoCommit(false)
to disable automatic transaction commit, and then manually commit the transaction after each batch insert. A new PreparedStatement
object is created every time data is inserted to avoid state inconsistencies. In the loop of inserting data, execute executeBatch()
to insert data every 10000 pieces of data.
In addition, you need to optimize the configuration of the connection pool and database according to the actual situation to prevent problems such as connection timeout.
Summary
To achieve efficient large-scale data insertion requires the combination of the following optimization strategies (combined use is recommended):
1. Batch processing: Submitting SQL statements in batches can reduce network transmission and processing overhead, and reduce the number of interactions with the database. In Java, you can use the addBatch()
method of Statement or PreparedStatement
to add multiple SQL statements, and then execute the executeBatch()
method once to submit Batch SQL statements.
-
With appropriate waiting time and batch size during loop insertion, so as to avoid problems such as high memory usage:
-
Set an appropriate batch size: The batch size refers to how many rows of data are inserted in one insert operation. If the batch size is too small, the frequency of insert operations will be high, while if the batch size is too large, memory usage may be high. Generally, it is recommended to set the batch size to 1000-5000 rows, which will reduce the frequency of insert operations and reduce memory footprint.
-
Use appropriate wait times: Wait times refer to the amount of time to wait between batch operations. Waiting too short can lead to high memory usage, while waiting too long can slow down insert operations. Generally, it is recommended to set the waiting time between a few seconds and tens of seconds, which will make the operation smooth and avoid problems such as high memory usage.
-
You can consider using some memory optimization techniques, such as using an in-memory database or using a cursor to insert data, to reduce memory usage.
-
-
In general, choosing an appropriate batch size and wait time can help you perform insert operations smoothly and avoid problems such as high memory usage.
2. Index: Temporarily remove the index before inserting a large amount of data, and then add it at the end, which can greatly reduce the time to update the index when writing.
3. Database connection pool: Using a database connection pool can reduce the overhead of establishing and closing a database connection and improve performance. In the case of not using the database connection pool, remember to close the relevant connection in finally.
Database parameter adjustment: increase the MySQL database buffer size, configure high-performance disk and I/O, etc.
Source: blog.csdn.net/qq_35427589/article/
details/129665307
Popular content:
-
Spring Boot + rule engine URule, too strong!
-
Are you still writing join table queries by hand? MyBatis-Plus is so delicious!
-
Spring Boot + flowable quickly implement workflow
I recently interviewed BAT, and compiled an interview material "Java Interview BAT Clearance Manual", covering Java core technology, JVM, Java concurrency, SSM, microservices, databases, data structures, etc. How to get it: Click "Watching", follow the official account and reply to 666 to get it, and more content will be provided one after another.
See you tomorrow (ω)