Can encrypted sensitive fields still be used for fuzzy query? How to achieve this?

Foreword

There is a question that everyone has thought about? Sensitive field data is encrypted and stored in the database table. If you need to perform fuzzy query on these sensitive fields, it will definitely not work to use the original fuzzy query method of like in the where clause of SQL. So how should it be implemented? This article will solve this problem.

Scenario analysis

Suppose there is a scenario like this: there is a personnel management function. The main fields of the personnel information list include name, gender, user account, mobile phone number, ID number, home address, registration date, etc. You can add any piece of data. , delete, modify, and check. The name, ID number, and mobile phone number fields must support fuzzy query.

A simple analysis of a scenario shows that: mobile phone number, ID number, and home address field data are sensitive data. The data in these fields must be encrypted and stored in the database, and need to be desensitized when displayed on the page.

If the user wants to query the information of all persons whose real names include “Zhang San”, they can enter a keyword on the page, such as “Zhang San”. After clicking to start the query, this parameter will be passed to the background, and the background will execute a sql, such as “select * from sys_person where real_name like '%张三%'“, the execution result includes all data records of all users whose real names contain “张三”, such as “张三” “, “Zhang Sanfeng” and so on.

If the user wants to query users whose mobile phone numbers end in “0537”, the background executes SQL similar to name fuzzy query, “select * from sys_person where phone like ' 37'“, You will definitely not get the correct result, because the data in the mobile phone number field in the database is the encrypted result, and ‘0537’ is plain text. Other sensitive fields such as ID number and home address also have similar problems when performing fuzzy queries. This is also the pain point of fuzzy queries for sensitive fields, that is, the fuzzy query keywords are inconsistent with the actual stored data.

Implementation plan

Here are some solutions:

The first method is to decrypt first and then query

Query out all the data in the target table, decrypt the encrypted data of the sensitive fields to be fuzzy query in the memory, then traverse the decrypted data, compare it with the fuzzy query keywords, and filter out the data containing the fuzzy query keywords. data row.

This method is the easiest to think of, but one obvious problem is that the fuzzy query process is performed in memory. If the amount of data is particularly large, it can easily lead to memory overflow, so it is not recommended to use this method in production. method;

Second type, plaintext mapping table

Create a new mapping table to store the mapping table between the decrypted data of the sensitive field and the primary key of the target table. When fuzzy query is needed, first perform a fuzzy query on the plaintext mapping table to obtain the primary key of the target data that meets the conditions, and then return it based on the primary key. Query target table;

This method is actually a bit deceptive. The main reason for encrypting and storing sensitive fields is to consider security. Using a plaintext mapping table to store decrypted sensitive fields is actually equivalent to unencrypted storage of sensitive fields, which is the same as the most commonly used fields. Encrypting sensitive fields defeats the purpose of encrypting sensitive fields, so it is not recommended to use this method in production;

The third method is to conduct decryption query at the database level

When executing query sql, the background decrypts sensitive fields first, and then executes like. Taking the fuzzy query of the personnel management list above as an example, the sql is transformed into: “select * from sys_person where AES_DECRYPT(phone,' key') like ' 37'”;

The advantage of this method is that the cost is relatively small and easy to implement, but the disadvantage is obvious. This field cannot be optimized through database indexing. In addition, some databases cannot guarantee that the encryption and decryption algorithm of the database is consistent with the encryption and decryption algorithm of the program, which may cause problems. This leads to the problem that it can be encrypted in the program but cannot be decrypted in the database, or that it can be encrypted in the database but not decrypted in the program. Therefore, this method is not recommended for use in production;

The fourth type, word segmentation ciphertext mapping table

This method is an extension and optimization based on the second idea, and it is also the mainstream method. Create a new word segmentation ciphertext mapping table. After the sensitive field data is added or modified, segment the sensitive fields into word segments. For example, the segmentation combinations for “15503770537” include “155”, “0377”, “0537”, etc., and then segment the sensitive fields. Each word segmentation is encrypted to establish an association between the word segmentation ciphertext of the sensitive field and the primary key of the target data row; when processing fuzzy queries, the fuzzy query keywords are encrypted, and the encrypted fuzzy query keywords are used to encrypt the word segmentation Perform a like query on the text mapping table to obtain the primary key of the target data row, and then return to the target table for precise query based on the primary key of the target data row.

Picture

Picture 1: Before group combination encryption

Picture

Picture 2: After group combination encryption

Taobao, Alibaba, Pinduo, JD.com and other major manufacturers support fuzzy query after encrypting sensitive user data. The following is a description of the fuzzy query scheme for sensitive fields of several major manufacturers. If you are interested, you can learn more:

Taobao ciphertext field retrieval solution

  • https://open.taobao.com/docV3.htm?docId=106213 &docType=1

Alibaba text field search solution

  • https://jaq-doc.alibaba.com/docs/doc.htm?treeId=1 & amp;articleId=106213 & amp;docType=1

Pinduoduo ciphertext field retrieval solution

  • https://open.pinduoduo.com/application/document/browse?idStr=3407B605226E77F2

Jingdong cipher text field retrieval solution

  • https://jos.jd.com/commondoc?listId=345

The advantage of this method is that the principle is simple and the implementation is not complicated. However, it has certain limitations. It is a solution that compromises performance and business. In comparison, among the methods that can be thought of, this method is more recommended. This method, but special attention should be paid to the fact that the length of the keywords for fuzzy queries must be limited at the business level; taking mobile phone numbers as an example, you can require that the keywords for fuzzy queries be four or five digits. Specifically, you can Then make detailed divisions according to specific scenarios.

Why add such a limit? Because the length of the plaintext becomes variable after encryption, there are additional storage costs and query performance costs. The more word segmentation combinations, the greater the storage space required and the query performance cost consumed. And the shorter the word segmentation, the more likely it is to be hard cracked. The greater the risk, it will also lead to a reduction in safety to a certain extent;

Environment configuration

  • jdk version: 1.8 Development tool: Intellij iDEA 2020.1

  • springboot:2.3.9.RELEASE

  • mybatis-spring-boot-starter:2.1.4

Dependency configuration

The example mainly uses SpringAop. The encryption is symmetric encryption. It uses the encryption and decryption tool class in the hutool toolkit. You can also use your own encapsulated encryption and decryption tool class.

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-aop</artifactId>
</dependency>
<dependency>
    <groupId>cn.hutool</groupId>
    <artifactId>hutool-all</artifactId>
    <version>5.3.3</version>
</dependency>

Code implementation

1. Create a new word segmentation ciphertext mapping table;

If there are multiple fuzzy query fields, they can be shared in a word segmentation ciphertext mapping table to expand multiple fields. Taking the personnel management function in the example as an example, create a new sys_person_phone_encrypt table (person’s mobile phone number Word segmentation ciphertext mapping table), used to store the mapping relationship between person ID and word segmentation combined ciphertext

create table if not exists sys_person_phone_encrypt
(
   id bigint auto_increment comment 'primary key' primary key,
   person_id int not null comment 'Associated person information table primary key',
   phone_key varchar(500) not null comment 'Mobile phone number segmentation ciphertext'
)
comment 'Personal mobile phone number word segmentation ciphertext mapping table';

2. When the sensitive field data is saved into the database, the sensitive fields are segmented, combined and encrypted, and stored in the segmented ciphertext mapping table;

When registering personnel information, first take out the mobile phone number encrypted through AOP and decrypt it; after decrypting the mobile phone number, segment the mobile phone number into four consecutive digits, encrypt the segmentation of each mobile phone number, and finally encrypt all the mobile phone numbers. The encrypted mobile phone number is segmented and spliced into a string, which is saved together with the person’s ID to the person’s mobile phone number segmentation ciphertext mapping table;

public Person register(Person person) {
    this.personDao.insert(person);
    String phone = this.decrypt(person.getPhoneNumber());
    String phoneKeywords = this.phoneKeywords(phone);
    this.personDao.insertPhoneKeyworkds(person.getId(),phoneKeywords);
    return person;
}
private String phoneKeywords(String phone) {
    String keywords = this.keywords(phone, 4);
    System.out.println(keywords.length());
    return keywords;
}
 
//Word segmentation combination encryption
private String keywords(String word, int len) {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < word.length(); i + + ) {
        int start = i;
        int end = i + len;
        String sub1 = word.substring(start, end);
        sb.append(this.encrypt(sub1));
        if (end == word.length()) {
            break;
        }
    }
    return sb.toString();
}
public String encrypt(String val) {
    //Pay special attention here. Symmetric encryption encrypts and decrypts based on the key. The encryption and decryption keys are the same. Once leaked, there is no secret at all.
    //"fanfu-csdn" is my customized key. It is only used for demonstration. In actual business, this key must be stored in a secure manner;
    byte[] key = SecureUtil.generateKey(SymmetricAlgorithm.DES.getValue(), "fanfu-csdn".getBytes()).getEncoded();
    SymmetricCrypto aes = new SymmetricCrypto(SymmetricAlgorithm.DES, key);
    String encryptValue = aes.encryptBase64(val);
    return encryptValue;
}
public String decrypt(String val) {
    //Pay special attention here. Symmetric encryption encrypts and decrypts based on the key. The encryption and decryption keys are the same. Once leaked, there is no secret at all.
    //"fanfu-csdn" is my customized key. It is only used for demonstration. In actual business, this key must be stored in a safe way;
    byte[] key = SecureUtil.generateKey(SymmetricAlgorithm.DES.getValue(), "fanfu-csdn".getBytes()).getEncoded();
    SymmetricCrypto aes = new SymmetricCrypto(SymmetricAlgorithm.DES, key);
    String encryptValue = aes.decryptStr(val);
    return encryptValue;
}

3. During fuzzy query, encrypt the fuzzy query keyword, use the encrypted keyword ciphertext as the query condition, query the ciphertext mapping table, obtain the id of the target data row, and then use the id of the target data row as the query condition. Query the target data table;

When performing a fuzzy query based on the four digits of the mobile phone number, query the sys_person_phone_encrypt table (person’s mobile phone number word segmentation ciphertext mapping table) based on the encrypted fuzzy query keyword as a condition to obtain the person information ID; Then use the personnel information id to query the personnel information table;

public List<Person> getPersonList(String phoneVal) {
    if (phoneVal != null) {
       return this.personDao.queryByPhoneEncrypt(this.encrypt(phoneVal));
    }
    return this.personDao.queryList(phoneVal);
}
<select id="queryByPhoneEncrypt" resultMap="personMap">
    select * from sys_person where id in
    (select person_id from sys_person_phone_encrypt
     where phone_key like concat('%',#{phoneVal},'%'))
</select>

Picture

picture

Example complete code:

  • https://gitcode.net/fox9916/fanfu-web.git