How to solve the problem of repeated insertion and update of multi-threaded database

Article directory

  • basic concept
  • Cause Analysis
  • solution
    • Multi-threaded insertion solution:
    • Multi-threaded update solution

Basic concepts

Idempotency: In programming, an idempotent operation is characterized by the fact that the impact of any number of executions is the same as that of one execution.
To put it simply: idempotence is an operation, no matter how many times it is executed, the effect and the returned result are the same.
Idempotent operations:
1. Query operation: Query once and multiple times, and the query results are the same when the data remains unchanged. select is a natural idempotent operation;

2. Delete operation: The delete operation is also idempotent, and the data is deleted once or multiple times. (Note that the returned results may be different. The deleted data does not exist, and 0 is returned. There are multiple deleted data, and multiple returned results);
3. Insert operation: the default primary key is unique in the case of insert, so inserting the same data multiple times is not idempotent
4. Update operation: There are two cases here:

1. update t set money=100 where id=1
2. update t set money=money + 100 where id=1

The first is idempotent, the second is not idempotent

Idempotency has nothing to do with whether you are distributed and highly concurrency or JavaEE. The key is whether your operation is idempotent.
To achieve idempotence, it is sufficient not to design any non-idempotent operations in terms of interface design. For example, the requirement is: when the user clicks to agree, the number of approvals of the answer + 1. Change to: When the user clicks to approve, ensure that there is a record in the answer approval table, user, answer. The number of approvals is counted by the answer approval table. When designing a system, it is the primary consideration, especially in systems like Alipay, banks, and Internet finance companies that involve money. It must be efficient and the data must be accurate, so there should be no excessive deductions or payment. And other problems, it will be difficult to deal with, and the user experience is not good.

Here is another article of mine:
8 solutions to idempotent problems to solve repeated submissions

Cause Analysis

It is common to do something like this:

if(user does not exist)
    store user in database
    Repeat push without taking any action

This operation has not been completed yet, the second thread with the same data has entered and passed the if test, causing the database to store two identical data. This is the multi-threaded concurrency that causes the program’s judgment logic to fail.


In stand-alone mode, simply use sync. Here we discuss the problem of avoiding repeated insertion in distributed scenarios
Here we mainly talk about the database to solve multi-threading, and other solutions can refer to the above mentioned:

8 solutions to idempotent problems to solve repeated submissions

Multi-threaded insertion solution:

1. Insert with where condition (similar to index)
1.1, Insert a single record

Ordinary INSERT INTO inserts:
INSERT INTO card(cardno, cardnum) VALUES('1111', '100');

For ordinary INSERT insertion, if we want to ensure that no duplicate records are inserted, we only have to create a unique constraint on a certain field (for example: cardno card number cannot be repeated);
If you want to ensure that multiple fields will not be repeated, you can consider a joint unique index!
Program processing after index creation:

if (the cardno exists in the database table) {
} else {
    try {
         // Violation of the unique constraint will report an exception: InvocationTargetException
         } catch (InvocationTargetException e) {
         //If there is already data for repeated insertion, update it

There is still a problem here, that is, if a unique index is used when the record logic is deleted in the table, there will be a BUG

The focus is here
So is there a solution that does not create a unique constraint and only implements a statement through INSERT INTO?

Answer: Yes, the specific syntax of INSERT INTO IF EXISTS is as follows:
INSERT INTO table(field1, field2, fieldn) SELECT 'field1', 'field2', 'fieldn' FROM DUAL WHERE NOT EXISTS(SELECT field FROM table WHERE field = ?)
Among them, DUAL is a temporary table that does not need to be physically created, so it can be used.

The transformation of the above card example is as follows:
INSERT INTO card(cardno, cardnum) SELECT '111', '100' FROM DUAL WHERE NOT EXISTS(SELECT cardno FROM card WHERE cardno = '111')
1.2. Insert multiple records

INSERT INTO user (id, no, add_time, remark)
select * from (
SELECT 1 id, 1 no, NOW() add_time,'1,2,3,1,2' remark FROM DUAL
SELECT 1 no, 2 no, NOW() add_time,'1,2,3,1,2' remark FROM DUAL
SELECT 1 no, 3 no, NOW() add_time,'1,2,3,1,2' remark FROM DUAL
) a where not exists (select no from user b where =

The above is to realize that the no field of the user table is not repeated, and three records are inserted.
In addition, attach the implementation statement that mybatis batch writes no fields without duplication.

INSERT INTO user (id, no, add_time, result)
select * from (
<foreach collection="list" item="obj" separator=" UNION ALL ">
SELECT #{} id, #{} no, #{obj.addTime} add_time, #{obj.result} result FROM DUAL
) a where not exists (select no from user b where =

Multi-thread update solution

1. Update uses the optimistic lock version method
?For example, if two users purchase a product at the same time, there is only 1 item in stock! The actual operation at the database level should be to reduce the inventory by 2. However, due to high concurrency, the first user completes the purchase, reads the current inventory and performs the operation of reducing 1. Since this operation is not fully executed, this will happen. The item is oversold!

select goods_num,version from goods where goods_name = "small book";
update goods set goods_num = goods_num -1, version = the queried version value is self-increased where goods_name = "small book" and version = the queried version;

Why is it enough to add a version field? Because of the help of the characteristics of the database itself, when the update statement is executed, if the update statement does not use the index when updating, the table will be locked to ensure that only one thread can enter the update at a time, etc. The next update operation will be performed after the lock is released for this update. If the version number of the submitted data is greater than the current version number of the database table, it will be updated. Otherwise, it will be considered as expired data, which can ensure the security of the program.
This kind of efficiency depends on database hardware capabilities under large data volume and high concurrency, and can be used for non-core business
2.Use select … for update pessimistic lock
This is the same as synchronized locking, checking first and then inserting or updating, but to avoid deadlocks, the efficiency is also poor, and it is recommended to use for single requests that do not have much concurrency

When acquiring data, lock and acquire: select * from table_xxx where id='xxx' for update;
Note: the id field must be a primary key or a unique index, otherwise it will lock the table, which will kill people; pessimistic locks are generally used together with transactions, and the data locking time may be very long, so choose according to the actual situation;
This kind of efficiency depends on database hardware capabilities under large data volume and high concurrency, and can be used for non-core business

Reference article:

Idempotent solution: