How to query the encrypted mobile phone number in a fuzzy manner?

Foreword

A few days ago, a friend from Knowledge Planet asked me a question: How to fuzzy query an encrypted mobile phone number?

We all know that when designing a system, considering the security of the system, it is necessary to encrypt some of the user’s personal privacy information, such as: login password, ID number, bank card number, mobile phone number, etc., to prevent the user from Information was leaked.

A long time ago, CSDN suffered from SQL injection, which resulted in the leakage of more than 6 million pieces of user information stored in clear text.

Therefore, when we design the system, we must consider encrypting and storing the user’s private information.

Common symmetric encryption algorithms include AES, SM4, ChaCha20, 3DES, DES, Blowfish, IDEA, RC5, RC6, Camellia, etc.

At present, the internationally mainstream symmetric encryption algorithm is AES, and the domestically recommended one is SM4.

No matter which algorithm is used, the difference between the string before encryption and the string after encryption is still relatively large.

For example, the string before encryption: Su Sanshuo Technology, using the key: 123, the generated string after encryption is: U2FsdGVkX1 + q7g9npbydGL1HXzaZZ6uYYtXyug83jHA= .

How to perform fuzzy query on encrypted string?

For example: Suppose the 苏三 keyword is queried, the encrypted string is: U2FsdGVkX19eCv + xt2WkQb5auYo0ckyw.

The difference between the two encrypted strings generated above seems relatively large, and there is no way to directly fuzzy query through the like keyword in the SQL statement.

So how do we implement the fuzzy query function of encrypted mobile phone numbers?

1 loaded into memory at once

To implement this function, the first way we think of may be to load the personal privacy data into the memory at once and cache it, then decrypt it in the memory, and then implement the fuzzy search function in the code.

889c327bb70a08dd37ac02585b0f9f48.pngThe advantage of this is: it is relatively simple to implement and the cost is very low.

But the problem is: if there is a lot of personal privacy data, the memory of the application server may not be enough, and OOM problems may occur.

There is another problem: data consistency problem.

If the user changes the mobile phone number and the database update is successful, the cache in the memory needs to be updated synchronously, otherwise the results of the user query may be inconsistent with the actual situation.

For example: the database update is successful, but the cache update in memory fails.

Or if your application deploys multiple server nodes, some of the memory caches are successfully updated, while the other part happens to be restarting, causing the update to fail.

This solution may not only cause OOM problems in the application server, but may also increase the complexity of the system. Overall, the gain outweighs the gain.

2 Using database functions

Since the encrypted string is stored in the database, another solution is to use the database function to decrypt.

We can use MySQL’s DES_ENCRYPT function to encrypt and use the DES_DECRYPT function to decrypt:

SELECT
DES_DECRYPT('U2FsdGVkX1 + q7g9npbydGL1HXzaZZ6uYYtXyug83jHA=', '123');

The encryption and decryption of all user privacy information in the application system are implemented at the MySQL layer, and there is no inconsistency in encryption and decryption.

When saving data in this solution, only the data of a single user is operated. The amount of data is relatively small and the performance is good.

However, when fuzzy querying data, each time you need to use the DES_DECRYPT function to decrypt all the data in a user’s private information field in the database, and then use the decrypted data to perform fuzzy query.

If the amount of data in this field is very large, the performance of each query will be very poor.

3 Save in segments

We can split a complete string into multiple smaller strings.

Take the mobile phone number as an example: 18200256007. Split every 3 digits into a group. The split string is: 182,820,200,002,025,256,560,600,007, these 9 groups of data.

Then create a table:

CREATE TABLE `encrypt_value_mapping` (
  `id` bigint NOT NULL COMMENT 'system number',
  `ref_id` bigint NOT NULL COMMENT 'Associated system number',
  `encrypt_value` varchar(255) NOT NULL COMMENT 'Encrypted string'
) ENGINE=InnoDB CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='Segmented encryption mapping table'

This table has three fields:

  • id: system number.

  • ref_id: The system number of the main business table, such as the system number of the user table.

  • encrypt_value: split encrypted string.

When the user writes the mobile phone number, the split mobile phone number group data is also written together to ensure data consistency in the same transaction.

If you want to fuzzy query the mobile phone number, you can directly fuzzy query the ref_id of the user table through the encrypt_value of encrypt_value_mapping, and then query the user information through ref_id.

The specific sql is as follows:

select s2.id,s2.name,s2.phone
from encrypt_value_mapping s1
inner join `user` s2 on s1.ref_id=s2.id
where s1.encrypt_value = 'U2FsdGVkX19Se8cEpSLVGTkLw/yiNhcB'
limit 0,20;

In this way, we can easily search for the mobile phone number we want through fuzzy query.

Note that the encrypt_value here uses the equal sign. Since it is an equal value query, it is more efficient.

Note: The mobile phone number queried through the SQL statement here is encrypted. Before the interface is returned to the front end, it needs to be decrypted in the code.

For the sake of security, you can also add some interference items to the encrypted plaintext password using * to prevent the mobile phone number from being leaked. The final content displayed to the user can be displayed like this: 182***07.

4 Other fuzzy queries

What should we do if in addition to the user’s mobile phone number, there are other user privacy fields that require fuzzy query?

We can expand the encrypt_value_mapping table and add a type field.

This field indicates the type of data, such as: 1. Mobile phone number 2. ID card 3. Bank card number, etc.

In this way, if there is a business scenario of querying the ID card and bank card number modules, we can distinguish them through the type field, or we can use this solution to write the data to the encrypt_value_mapping table, and finally query different grouped data according to different types.

If the amount of data in the business table is small, this solution can meet the needs.

However, if the amount of data in the business table is large, a mobile phone number needs to save 9 pieces of data, and an ID card or bank card number also needs to save many pieces of data. This will cause the data in the encrypt_value_mapping table to increase sharply, which may cause this table to Very big.

The final consequence is that it greatly affects query performance.

So, what should we do in this situation?

5 Add fuzzy query field

If the amount of data is large, grouping all user privacy information fields into one table will really affect the performance of the query.

So, how to optimize?

Answer: We can add fuzzy query fields.

Let’s take fuzzy query on mobile phone as an example.

We can add an encrypt_phone field next to the mobile phone number in the user table.

CREATE TABLE `user` (
  `id` int NOT NULL,
  `code` varchar(20) NOT NULL,
  `age` int NOT NULL DEFAULT '0',
  `name` varchar(30) NOT NULL,
  `height` int NOT NULL DEFAULT '0',
  `address` varchar(30) DEFAULT NULL,
  `phone` varchar(11) DEFAULT NULL,
  `encrypt_phone` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='User table'

Then when we save the data, we splice the grouped data together.

Or take the mobile phone number as an example:

18200256007, split into groups of 3 digits, the split string is: 182,820,200,002,025,256,560,600,007, these 9 groups of data.

After grouping, encrypting, split by commas and then spliced into the following data:,U2FsdGVkX19Se8cEpSLVGTkLw/yiNhcB,U2FsdGVkX1 + qysCDyVMm/aYXMRpCEmBD,U2FsdGVkX19oXuv8m4ZAjz + AGhfXlsQk,U2FsdGVkX19VFs60R2 6BLFzv5nDZX40U,U2FsdGVkX19XPO0by9pVw4GKnGI3Z5Zs,U2FsdGVkX1/FIIaYpHlIlrngIYEnuwlM,U2FsdGVkX19s6WTtqngdAM9sgo5xKvld,U2FsdGVkX19PmLyjtuOpsMYKe2pmf + XW,U2FsdGVkX1 + cJ/qussMgdPQq3WGdp16Q .

In the future, you can directly query the field encrypt_phone through sql fuzzy query:

select id,name,phone
from user where encrypt_phone like '%U2FsdGVkX19Se8cEpSLVGTkLw/yiNhcB%'
limit 0,20;

Note that the encrypt_value here uses like.

Why are we separated by commas here?

Answer: It is to prevent direct string splicing. In extreme cases, the two grouped data originally do not meet the fuzzy search conditions, but when spliced together, some of them meet the conditions.

Of course, you can also change the comma to other special characters according to the actual situation.

In addition, for other user privacy fields, if you want to implement fuzzy query functions, you can also use a similar solution.

Finally, although this article introduces a variety of solutions for encrypting mobile phone numbers to implement fuzzy query functions, we have to choose based on actual business scenarios. There is no best solution, only the most appropriate one.

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Java Skill TreeHomepageOverview 138826 people are learning the system