Using JAVA to insert 100 million data into Mysql – efficiency evaluation
When researching the query efficiency in mysql optimization in the past few days, I found that the test data is too small (100,000 levels). Using EXPLAIN to compare different SQL statements, I can’t get more effective evaluation data, most of them are ambiguous, and I dare not make a conclusion based on these data .
Therefore, by randomly generating a person’s name, age, gender, phone number, email, and address, a large amount of data is inserted into the mysql database, which is convenient for testing the efficiency of SQL statement optimization with a large amount of data. , During the generation process, it was found that different methods are used, and the efficiency varies greatly.
1. Go to the Mysql database first, and randomly generate personnel data maps. They are ID, name, gender, age, Email, phone number, and address.
The figure below has a total of 33 million data:
When the amount of data is at the level of 100 million, do not click the button below, which will cause Navicat to continue loading the data at the level of 100 million, causing the computer to crash. ~If you feel that your computer configuration is good, you can try it, and you may be pleasantly surprised
2. In this evaluation, a total of three strategies and five situations were passed to conduct a large-scale data insertion test
The strategies are:
- Mybatis lightweight framework insertion (no transaction)
- Use JDBC to process directly (open transaction, no transaction)
- Use JDBC batch processing (open transaction, no transaction)
Test Results:
Mybatis lightweight insertion -> JDBC direct processing -> JDBC batch processing.
JDBC batch processing, the highest efficiency
The first strategy test:
2.1 Mybatis lightweight framework insertion (no transaction)
Mybatis is a lightweight framework, which is lighter and more efficient than hibernate.
However, when processing a large number of data insertion operations, it is necessary to implement an ORM conversion in the process. There are instances in this test, and the transaction is not opened, resulting in the general efficiency of mybatis.
The content of the experiment here is:
- Use the Spring framework to generate mapper instances and create character instance objects
- Loop changes the properties of the instance object and inserts.
//There is no transaction in the code private long begin = 33112001;//start id private long end = begin + 100000;//The amount of data inserted in each loop private String url = "jdbc:mysql://localhost:3306/bigdata?useServerPrepStmts=false &rewriteBatchedStatements=true &useUnicode=true &characterEncoding=UTF-8"; private String user = "root"; private String password = "0203"; @org.junit.Test public void insertBigData2() { //Load Spring and get the PersonMapper instance object. The time of creation here does not have a great impact on the final result ApplicationContext context = new ClassPathXmlApplicationContext("applicationContext.xml"); PersonMapper pMapper = (PersonMapper) context. getBean("personMapper"); //Create a person instance Person person = new Person(); //count start time long bTime = System. currentTimeMillis(); //Start the loop, the number of loops is 500W times. for(int i=0;i<5000000;i++) { //Assign a value to person person. setId(i); person.setName(RandomValue.getChineseName()); person.setSex(RandomValue.name_sex); person.setAge(RandomValue.getNum(1, 100)); person.setEmail(RandomValue.getEmail(4,15)); person.setTel(RandomValue.getTel()); person.setAddress(RandomValue.getRoad()); //execute insert statement pMapper.insert(person); begin++; } // count end time long eTime = System. currentTimeMillis(); System.out.println("Time-consuming to insert 500W pieces of data:" + (eTime-bTime)); }
I wanted to test inserting 5 million pieces of data, but the actual running process was too slow, and the program had to be terminated halfway. Finally, 52W data is obtained, which takes about two songs (7~9 minutes). Then, use mybatis to insert 10000 data into mysql.
The result is as follows:
Using mybatis to insert 10,000 pieces of data takes time: 28613, or 28.6 seconds
The second strategy test:
2.2 Use JDBC to process directly (open transaction, close transaction)
Using the strategy of JDBC direct processing, the experimental content here is divided into two types: open transaction and unopened transaction. The process is as follows:
- Precompile with PreparedStatment
- Loop, insert corresponding data, and store in
How much impact do transactions have on inserting data? See the experimental results below:
//This code is to open the transaction private long begin = 33112001;//start id private long end = begin + 100000;//The amount of data inserted in each loop private String url = "jdbc:mysql://localhost:3306/bigdata?useServerPrepStmts=false &rewriteBatchedStatements=true &useUnicode=true &characterEncoding=UTF-8"; private String user = "root"; private String password = "0203"; @org.junit.Test public void insertBigData3() { //Define connection and statement objects Connection conn = null; PreparedStatement pstm = null; try { //load jdbc driver Class.forName("com.mysql.jdbc.Driver"); //Connect to mysql conn = DriverManager. getConnection(url, user, password); // turn autocommit off conn.setAutoCommit(false); //write sql String sql = "INSERT INTO person VALUES (?,?,?,?,?,?,?)"; //precompile sql pstm = conn. prepareStatement(sql); // start total time long bTime1 = System. currentTimeMillis(); // Loop 10 times, 10,000 data each time, a total of 100,000 for(int i=0;i<10;i ++ ) { //Enable split timing, count 1W data time consumption long bTime = System. currentTimeMillis(); //start loop while (begin < end) { //assignment pstm.setLong(1, begin); pstm.setString(2, RandomValue.getChineseName()); pstm.setString(3, RandomValue.name_sex); pstm.setInt(4, RandomValue.getNum(1, 100)); pstm.setString(5, RandomValue.getEmail(4, 15)); pstm.setString(6, RandomValue.getTel()); pstm.setString(7, RandomValue.getRoad()); //execute sql pstm. execute(); begin++; } //commit transaction conn.commit(); //Boundary value increases by 10W end += 10000; //Turn off split timer long eTime = System. currentTimeMillis(); // output System.out.println("Time-consuming to successfully insert 1W pieces of data:" + (eTime-bTime)); } // close total long eTime1 = System. currentTimeMillis(); // output System.out.println("Total time spent inserting 10W data:" + (eTime1-bTime1)); } catch (SQLException e) { e.printStackTrace(); } catch (ClassNotFoundException e1) { e1.printStackTrace(); } }
1. We first use the above code to test how long it takes to insert 10W pieces of data in a no-transaction state.
As shown in the picture:
Time-consuming to successfully insert 1W pieces of data: 21603 Time-consuming to successfully insert 1W pieces of data: 20537 Time-consuming to successfully insert 1W pieces of data: 20470 Time-consuming to successfully insert 1W pieces of data: 21160 Time-consuming to successfully insert 1W pieces of data: 23270 Time-consuming to successfully insert 1W pieces of data: 21230 Time-consuming to successfully insert 1W pieces of data: 20372 Time-consuming to successfully insert 1W pieces of data: 22608 Time-consuming to successfully insert 1W pieces of data: 20361 Time-consuming to successfully insert 1W pieces of data: 20494 Total time spent inserting 10W data: 212106
The conclusion of the experiment is as follows:
When no transaction is opened, an average of 10,000 data is inserted every 21.2 seconds.
Then we test that after opening the transaction, it takes time to insert 100,000 pieces of data, as shown in the figure:
Time-consuming to successfully insert 1W pieces of data: 4938 Time-consuming to successfully insert 1W pieces of data: 3518 Time-consuming to successfully insert 1W pieces of data: 3713 Time-consuming to successfully insert 1W pieces of data: 3883 Time-consuming to successfully insert 1W pieces of data: 3872 Time-consuming to successfully insert 1W pieces of data: 3873 Time-consuming to successfully insert 1W pieces of data: 3863 Time-consuming to successfully insert 1W pieces of data: 3819 Time-consuming to successfully insert 1W pieces of data: 3933 Time-consuming to successfully insert 1W pieces of data: 3811 Total time spent inserting 10W data: 39255
The conclusion of the experiment is as follows:
After opening the transaction, an average of 10,000 data is inserted every 3.9 seconds
The third strategy test:
2.3 Use JDBC batch processing (open transaction, no transaction)
When using JDBC batch processing, you need to pay attention to the following points:
1. Batch processing and precompilation need to be enabled when connecting to the URL
String url = "jdbc:mysql://localhost:3306/User?rewriteBatched -Statements=true &useServerPrepStmts=false”;
2. The PreparedStatement preprocessing sql statement must be placed outside the loop
code show as below:
private long begin = 33112001;//start id private long end = begin + 100000;//The amount of data inserted in each cycle private String url = "jdbc:mysql://localhost:3306/bigdata?useServerPrepStmts=false &rewriteBatchedStatements=true &useUnicode=true &characterEncoding=UTF-8"; private String user = "root"; private String password = "0203"; @org.junit.Test public void insertBigData() { //Define connection and statement objects Connection conn = null; PreparedStatement pstm = null; try { //load jdbc driver Class.forName("com.mysql.jdbc.Driver"); //Connect to mysql conn = DriverManager. getConnection(url, user, password); // turn autocommit off // conn.setAutoCommit(false); //write sql String sql = "INSERT INTO person VALUES (?,?,?,?,?,?,?)"; //precompile sql pstm = conn. prepareStatement(sql); // start total time long bTime1 = System. currentTimeMillis(); //Loop 10 times, each time 100,000 data, a total of 10 million for(int i=0;i<10;i ++ ) { //Enable split timing, count 1W data time consumption long bTime = System. currentTimeMillis(); //start loop while (begin < end) { //assignment pstm.setLong(1, begin); pstm.setString(2, RandomValue.getChineseName()); pstm.setString(3, RandomValue.name_sex); pstm.setInt(4, RandomValue.getNum(1, 100)); pstm.setString(5, RandomValue.getEmail(4, 15)); pstm.setString(6, RandomValue.getTel()); pstm.setString(7, RandomValue.getRoad()); //Add to the same batch pstm.addBatch(); begin++; } //Execute batch processing pstm. executeBatch(); //commit transaction // conn.commit(); //Boundary value increases by 10W end += 100000; //Turn off split timer long eTime = System. currentTimeMillis(); // output System.out.println("Time-consuming to successfully insert 10W pieces of data:" + (eTime-bTime)); } // close total long eTime1 = System. currentTimeMillis(); // output System.out.println("Total time-consuming to insert 100W data:" + (eTime1-bTime1)); } catch (SQLException e) { e.printStackTrace(); } catch (ClassNotFoundException e1) { e1.printStackTrace(); } }
Start testing first
No transaction, insert 10W pieces of data each cycle, loop 10 times, a total of 100W pieces of data.
The result is as follows:
Time-consuming to successfully insert 10W pieces of data: 3832 Time-consuming to successfully insert 10W pieces of data: 1770 Time-consuming to successfully insert 10W pieces of data: 2628 Time-consuming to successfully insert 10W pieces of data: 2140 Time-consuming to successfully insert 10W pieces of data: 2148 Time-consuming to successfully insert 10W pieces of data: 1757 Time-consuming to successfully insert 10W pieces of data: 1767 Time-consuming to successfully insert 10W pieces of data: 1832 Time-consuming to successfully insert 10W pieces of data: 1830 Time-consuming to successfully insert 10W pieces of data: 2031 Total time spent inserting 100W data: 21737
Experimental results:
Using JDBC batch processing, without opening the transaction, an average of 100,000 pieces of data are inserted every 2.1 seconds
Continue to test
Start the transaction, insert 10W pieces of data each cycle, and cycle 10 times, a total of 100W pieces of data.
The result is as follows:
Time-consuming to successfully insert 10W pieces of data: 3482 Time-consuming to successfully insert 10W pieces of data: 1776 Time-consuming to successfully insert 10W pieces of data: 1979 Time-consuming to successfully insert 10W pieces of data: 1730 Time-consuming to successfully insert 10W pieces of data: 1643 Time-consuming to successfully insert 10W pieces of data: 1665 Time-consuming to successfully insert 10W pieces of data: 1622 Time-consuming to successfully insert 10W pieces of data: 1624 Time-consuming to successfully insert 10W pieces of data: 1779 Time-consuming to successfully insert 10W pieces of data: 1698 Total time spent inserting 100W data: 19003
Experimental results:
Use JDBC batch processing, start a transaction, and insert 100,000 pieces of data every 1.9 seconds on average
3 Summary
It can be seen that both JDBC direct processing and JDBC batch processing take less time when the transaction is turned on.
- Mybatis lightweight framework is inserted, mybatis was hacked miserably in my experiment, haha. After the actual start of the transaction, the gap will not be so large (the gap is 10 times). Anyone who is interested can continue to test
- JDBC handles it directly. In this experiment, the time-consuming difference between opening and closing a transaction is about 5 times, and this multiple will increase with the increase of the amount of data. Because when the transaction is not opened, if you update 10,000 pieces of data, you have to access the database 10,000 times. As a result, each operation needs to operate the database once.
- JDBC batch processing, in this experiment, the time-consuming difference between opening and closing a transaction is very small (we will increase the test later to increase the gap in this value). But it can be seen that after the transaction is started, the speed is still improved.
Conclusion: When designing the insertion of a large number of single data, using JDBC batch processing and transaction mixing is the fastest
The measured time to insert 100 million pieces of data using batch processing + transaction mix: 174756 milliseconds
4 supplement
JDBC batch processing transactions, opening and closing transactions, inserting 20 times for evaluation, 50W data at a time, a total of 10 million data time-consuming:
1. Open the transaction (the data is too long to post)
Total time spent inserting 1000W data: 197654
2. Close the transaction (the data is too long to post)
Total time spent inserting 1000W data: 200540
Still not much difference~
Borrow:
They are:
- No batch processing, no transaction;
- Only use batch processing, not transactions;
- Only use transactions, no batch processing;
- Use both transactions and batch processing; (obviously, this is the fastest, so it is recommended to use batch processing and transactions at the same time when processing large amounts of data)
Source: https://zhuanlan.zhihu.com/p/517588313