Hash and Encrypt

Hash is to convert the target text into an irreversible hash string (or message digest) of the same length, and Encrypt is to convert the target text into reversible ciphertext with different lengths.

Hash is to convert the target text into an irreversible hash string (or message digest) of the same length, and Encrypt is to convert the target text into reversible ciphertext with different lengths.

  • Hash algorithms are often designed to generate text of the same length, while encryption algorithms generate text of length related to the length of the plaintext itself.
  • Hash algorithms are irreversible, while encryption algorithms are reversible.

The HASH algorithm is a message digest algorithm, not an encryption algorithm, but because of its one-way operation, it has certain irreversibility and becomes a component of the encryption algorithm.

Hash algorithm of JDK’s String. code show as below:

public int hashCode() {
    int h = hash;
    if (h == 0 & amp; & amp; value. length > 0) {
        char val[] = value;
        for (int i = 0; i < value. length; i ++ ) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

As can be seen from the JDK API, its algorithm equation is s[0]*31^(n-1) + s[1]*31^(n-2) + … + s[n-1] ?, where s[i] is the character whose index is i, and n is the length of the string.

When calculating the hash of HashMap, first calculate the hashCode(), and then perform the second hash. code show as below:

// Calculate the secondary Hash
int hash = hash(key.hashCode());

static int hash(int h) {
 h ^= (h >>> 20) ^ (h >>> 12);
 return h ^ (h >>> 7) ^ (h >>> 4);
}

It can be found that although the algorithms are different, after these shift operations, the same algorithm is used for the same value, and the calculated hash values must be the same.

So, why is hash irreversible?

If there are two passwords 3 and 4, my encryption algorithm is simply 3 + 4, the result is 7, but I can’t be sure that the two passwords are 3 and 4 through 7, there are many combinations, this is the simplest Irreversible, so you can only try one by one by brute force.

Part of the information in the original text is lost during the calculation. An MD5 can theoretically correspond to multiple original texts, because MD5 has a finite number of original texts and an infinite number of original texts.

Why is irreversible MD5 insecure?

Because the hash algorithm is fixed, the hash string calculated from the same string is fixed, so the following method can be used to crack.

  • Violent enumeration method: Simply and roughly enumerate all the original texts and calculate their hash values to see which hash value is consistent with the given information digest.
  • Dictionary method: Hackers use a huge dictionary to store as much original text and corresponding hash values as possible. Each time the dictionary is looked up with a given information digest, the result of the collision can be quickly found.
  • Rainbow table (rainbow) method: improved on the basis of the dictionary method, exchanging time for space. It is a common way to crack hashes now.

For a stand-alone computer, the time cost of the brute force enumeration method is very high (take the combined password of 14 letters and numbers as an example, there are 1.24×10^25 possibilities, even if the computer can perform 1 billion operations per second, it takes It takes 400 million years to crack), and the space cost of the dictionary method is very high (still taking the combined password of 14 letters and numbers as an example, the comparison table of the 32-bit hash string of the generated password will occupy 5.7×10^14 TB of storage space ). However, using distributed computing and distributed storage, the MD5 algorithm can still be effectively cracked. Therefore, these two methods are also widely used by hackers.

How to defend against the cracking of the rainbow table?

Although the rainbow table has such an amazing cracking efficiency, the security personnel of the website still have ways to defend against the rainbow table. The most effective method is to “add salt”, that is, to insert a specific string in a specific position of the password. This specific string is “Salt (Salt)”. The hash string before the salt is completely different, and the password obtained by the hacker with the rainbow table is not the real password at all. Even if the hacker knows the content of the “salt” and where to add the salt, he still needs to modify the H function and R function, and the rainbow table needs to be regenerated, so adding salt can greatly increase the difficulty of using the rainbow table attack.

For a website, if the encryption algorithm and salt are leaked, targeted attacks are still very insecure. Because the string encrypted with the same encryption algorithm and the same salt is still the same!

A more difficult encryption algorithm Bcrypt

BCrypt is a cryptographic hash function designed by Niels Provos and David Mazières. He is based on the Blowfish cipher and was proposed on USENIX in 1999.

In addition to adding salt to resist rainbow table attacks, a very important feature of bcrypt is adaptability, which can ensure that the encryption speed is within a specific range, even if the computing power of the computer is very high, it can be increased by increasing the number of iterations , making the encryption slower and thus resistant to brute-force search attacks.

Bcrypt can be simply understood as it implements random salt processing internally. With Bcrypt, the ciphertext after each encryption is different.

For a password, the hash generated by Bcrypt is different every time, so how does it verify it?

  • Although for the same password, the generated hash is different each time, but the hash contains salt (hash generation process: first randomly generate salt, salt and password are hashed);
  • In the next verification, the salt is taken out from the hash, and the salt is hashed with the password; the obtained result is compared with the hash stored in the DB.

The Bcrypt encryption algorithm is built in Spring Security, and the construction is also very simple. The code is as follows:

@Bean
public PasswordEncoder passwordEncoder(){
    return new BCryptPasswordEncoder();
}

The format of the generated encrypted string is as follows:

$2b$[cost]$[22 character salt][31 character hash]

for example:

$2a$10$N9qo8uLOickgx2ZMRZoMyeIjZAgcfl7p92ldGxad68LJZdL17lhWy
\__/\/\____________________/\______________________________/
 Alg Cost Salt Hash

In the above example, $2a$ represents the unique sign of the hash algorithm. The Bcrypt algorithm is represented here.

10 ? represents the cost factor, here is 2 to the 10th power, which is 1024 rounds.

N9qo8uLOickgx2ZMRZoMye ? is a 22-length character obtained by base64 encoding the 16-byte (128bits) salt.

The final IjZAgcfl7p92ldGxad68LJZdL17lhWy? is a 24-byte (192bits) hash, which is a 31-length character obtained through bash64 encoding.

PasswordEncoder interface

This interface is built into Spring Security, as follows:

public interface PasswordEncoder {
   String encode(CharSequence rawPassword);

   boolean matches(CharSequence rawPassword, String encodedPassword);

   default boolean upgradeEncoding(String encodedPassword) {
      return false;
   }
}

This interface has three methods:

  • The parameter accepted by the encode method is the original password string, and the return value is the encrypted hash value, which cannot be reversely decrypted. This method is usually used when adding users to the system, or when users register.
  • The matches method is used to check whether the user input password rawPassword matches the encrypted hash value encodedPassword. If it can match, it returns true, indicating that the password rawPassword entered by the user is correct, otherwise it returns fasle. That is to say, although the hash value cannot be reversely decrypted, it can be judged whether it matches the original password. This method usually checks the correctness of the password entered by the user when the user logs in.
  • The purpose of upgradeEncoding is to determine whether the current password needs to be upgraded. That is, does it need to be re-encrypted? Return true if needed, fasle if not. The default implementation is to return false.

For example, we can use the following sample code to encrypt and store user passwords during user registration

//Save the User to the database table, which contains the password column
user.setPassword(passwordEncoder.encode(user.getPassword()));

BCryptPasswordEncoder is the PasswordEncoder interface implementation class recommended by Spring Security

public class PasswordEncoderTest {
  @Test
  void bCryptPasswordTest(){
    PasswordEncoder passwordEncoder = new BCryptPasswordEncoder();
    String rawPassword = "123456"; //raw password
    String encodedPassword = passwordEncoder.encode(rawPassword); //Encrypted password

    System.out.println("raw password" + rawPassword);
    System.out.println("encrypted hash password:" + encodedPassword);

    System.out.println(rawPassword + "Whether it matches" + encodedPassword + ":" //Password verification: true
             + passwordEncoder. matches(rawPassword, encodedPassword));

    System.out.println("654321 matches " + encodedPassword + ":" //Define a wrong password for verification: false
             + passwordEncoder. matches("654321", encodedPassword));
  }
}

The result of the above test case execution is as follows. (Note: For the same original password, the hash password after each encryption is different. This is the power of BCryptPasswordEncoder. Not only can it not be cracked, but you can’t find a needle in a haystack through the common password comparison table.) , the output is as follows:

Original password 123456
Hash password after encryption: $2a$10$zt6dUMTjNSyzINTGyiAgluna3mPm7qdgl26vj4tFpsFO6WlK5lXNm
Does 123456 match $2a$10$zt6dUMTjNSyzINTGyiAgluna3mPm7qdgl26vj4tFpsFO6WlK5lXNm:true
Does 654321 match $2a$10$zt6dUMTjNSyzINTGyiAgluna3mPm7qdgl26vj4tFpsFO6WlK5lXNm:false

BCrypt generates random salt (the role of salt is that the taste of the dish is different every time). This is important because it means each encode will produce a different result.