How to crack the password of compressed package and process CTF compressed package

I. Introduction

We often come into contact with compressed packages, which are used to compress and store/transmit files. Compressed package processing is a very important part in CTF competitions, because compressed packages may contain important information: Many CTF questions will hide key information in compressed packages, and contestants need to unzip and view them. Content can provide useful clues. Decryption and compression is a common CTF skill: Contestants need to master the decompression methods and tools of various compressed file formats, and how to encrypt and decrypt compressed packages. Compressed package processing can improve problem-solving efficiency: If contestants can quickly decompress and view the file list and content in the compressed package, they can find key information faster and improve problem-solving efficiency. For question makers, compressed package processing can increase the difficulty of the question: If a CTF question involves multiple compressed packages or complex encryption algorithms, it will be more challenging and test the contestants’ ability. Skill level and patience.

II. Common compressed file formats

The compressed packages we usually come into contact with mainly include the following types: zip format, rar format, 7z format, tar format, gzip format
Among them, zip and rar are the most common. The remaining 7z, tar, and gzip mainly run on Linux. Questions are often asked about zip and rar. Next, we will briefly introduce the two compression packages of zip and rar.

Zip compressed package:

The typical feature is the suffix “.zip”, and its MIME format is application/zip. Zip compression is a lossy compression format, that is, data will be lost due to compression, but this loss will not significantly affect the quality of the file. The zip format can support a variety of compression algorithms, such as storage, reduction, enhancement, optimal compression, etc., which can be selected according to different needs. zip compression supports encrypted compression.

Feature 1: Data record format: compressed source file data area + compressed source file directory area + compressed source file directory end mark
Compressed source file database: [file header + file data + data descriptor]
File header: 50 4B 03 04: This is the file header mark (0x04034b50), you can also see it starts with “PK…”
The compressed source file directory area is: 50 4B 01 02
Compressed source file directory end mark: 50 4B 05 06
It should be noted that the encryption point is that each group of four digits is only related to the second number.
When the second numberis odd –>encrypt
When the second numberis an even number –> not encrypted
① No encryption
The global encryption of the compressed source file data area should be 00 00 (50 4B 03 04 after two bytes)
And the global mode bit mark of the compressed source file directory area should be 00 00 (after 50 4B 01 02 four bytes)

② Fake encryption
The global encryption of the compressed source file data area should be 00 00
And the global mode bit mark of the compressed source file directory area should be 09 00

③ True encryption
The global encryption of the compressed source file data area should be 09 00
And the global mode bit mark of the compressed source file directory area should be 09 00

RAR archive

Typical characteristics of “.rar”, rar files are mainly composed of mark blocks, compressed file header blocks, file header blocks, and end blocks.
RAR file header 52 61 72 21 1A 07 00
RAR file end C4 3D 7B 00 40 07 00
Pseudo-encryption: The principle of pseudo-encryption of RAR is the same as that of ZIP. The key to pseudo-encryption is a designated bit mark field.

PS: Generally, when a RAR pseudo-encrypted compressed package is opened with WinRAR, it will show that the file header is damaged.

In the 24th byte of RAR, which is the ubyte PASSWORD_ENCRYPTED field in the file structure displayed by 010 Editor, modify the field to 1 to achieve RAR pseudo-encryption.
24th byte modification bit 1
Or modify the 11th byte, which is the value of the ubyte BLOCK_HEADERS_ENCRYPTED field in the file structure displayed by 010 Editor. Changing it to 1 can cause RAR pseudo-encryption.
The 11th byte is modified to 1
The same method is to change the value of the corresponding position to 0 to crack the pseudo-encrypted rar. However, generally rar appears less in CTF, and the focus is on making a fuss about the zip compressed package.
After understanding some of the most common compressed package formats in CTF, let’s share how to do CTF questions. In fact, most of the problems with compressed packages are about cracking passwords, followed by reminders about incomplete files.

III. Practical exercise: solving a CTF problem containing a compressed package

3.1 Pseudo-encryption

This part of the content has already been described in Chapter 2, so I won’t go into details here.

3.2 Brute force cracking

The ARCHPR.exe tool is usually used to crack compressed files. The supported types are zip, rar, ace, and arj file types.
Usage scenario: encrypted compressed files under Windows.
2.1. Select brute force cracking as the attack type. Select the brute force cracking range option according to the prompts in the range position to set the type of brute force cracking included, start at and end in the specific range of the option, if not defined, brute force cracking in the entire range. It is recommended to use a numeric password of 1 to 9 digits and the English dictionary that comes with the system as the password dictionary.
ARCHPR
Make some simple settings, no need to go into details here
Brute force cracking results
In this picture we can get the result of brute force cracking, which is the compressed package decryption password.

PS: The brute force cracking here can only be cracked for truly encrypted compressed files. The cracking time is related to the complexity of the encryption key. Pseudo-encrypted compressed packages cannot be cracked by this tool.

3.3 Plain text attack

  • A plaintext attack refers to an attack method in which you already know one or more of the seven types of files in the encrypted zip file, guess the key and decrypt the zip file. Compared with brute force cracking, this method is more effective It is more efficient to crack compressed packages with more complex passwords.
  • For zip files, the conditions for plaintext attacks are: ① There is a separate file known and the CRC value after compression is equal to the CRC value of a compressed package containing this file; ② The compression algorithm of the plaintext file must be Same algorithm as encrypted compressed files
    + You can see that the cyclic redundancy bit CRC is consistent
    As in the title:
    ACTF Freshman Competition 2020 Plaintext Attack
    1. Use binwalk to separate files in Linux, but it failed.
    Open the image in 010 and find that there is a flag.txt at the end, but a header is missing.
    2. After copying, add 50 4B to become a new compressed package 11.zip, open it There is flag.txt

    Carry out plaintext attacks. Because the sizes of compressed packages are different, you can use winrar to automatically repair them.
    (Because the crc32 of flag.txt is the same, but the decompression package found two files good-merged .jpg and qwe.zip in the asd folder, and there is a password for flag.txt in the qwe.zip file, study good- The .jpg has been merged, use steghide to view the hidden content of the file: steghide info filename has different compressed sizes! And the res.zip package is relatively large after compression)
    Finally, use archpr plaintext attack to get out the flag.

Summarize:
Step 1: Next open ARCHPR, select plaintext as the attack type, and select “unencrypted.zip” as the plaintext file path, that is, compress the plaintext file without encryption;
Step 2: Then encrypt the file and select the zip folder that needs to be cracked;
Step 3: Select the file to be cracked and click Start. After successful cracking, you will get the password;

3.4 CRC32 collision

CRC itself means “redundant check code”, and CRC32 means that a 32-bit (8-bit hexadecimal number) check value will be generated. Because CRC32 generates a check value for each bit of the source data block, are involved in the calculation, so even if only one bit in the data block changes, different CRC32 values will be obtained. In short, each file has a unique CRC32 value. Even if one bit in the data changes, it will result in different CRC32 values. . If you know the length and CRC32 value of a piece of data, you can exhaustively enumerate the data and compare it with its CRC32 to achieve the purpose of violent guessing. But usually only works for smaller text files. The crc32 in the zip file is the check code of the unencrypted file. For example, here is an encrypted compressed package. You can see the information by double-clicking it. And I know that it is all numbers, so I can use a script to blast it. Note: Requires linux environment
As in the title:
Title: MISC60
Title description: The answer to everything in the universe is 32
1. Problem solving analysis
Daily binwalk analysis of pictures found that there is a 32.txt file with the following content:

GUYDIQRQGMYDIMCBGAYDAOJQGAYDAMBQGQ7DIOJQIE7DON7FHBBEENK
GGQYTEMBQGAYDAMBQGYYDAMBQGAYDANJQGAYDAMBQGY6DERJXGQ6T
QNZUGE6TCNRVGE5EIQZXGJCTIOKEIZBDQRCCGMYTMMZRII7UIQKFIM7DE
RRZGA7TANCCGA6TAOBXIU5EEQRVIY7DCMRQGAYDAMBQGA6DAMBQGAYD
ANJQGRBDAMZQGQYECMBQGA5TAMBQGAYDANJTGQ5TAQJUG4BTQNZUIF
AUCNJRGIYDAMBQGAYDANRQGAYDAMBQGA7TAMBQGAYDANSDGJCTONB
XHA6TIMJXGE6DKMJYIRBTOMSFGQ5UIRSCHBCEEMRWGFBECQJSGRBTMQ
JWINBDGQRVGA7EEMBXGA5EGOBXGRAUCQJVGEZDAMBQGAYDAMBWGAY
DAMBQGA7TANCCGAZTANBQIEYDAMBZGAYDAMBQGAZTINRSGBATINZUGV
CDMMBQIMYTCMRQGAYDAMBQGA6DAMBQGAYDAMBVGAYDAMBQGA6DCM
SFG57DOOBXGQYTOMJWGUYTQRCDG5ZEKNBZIRDEEOCEII7DCNCBGAZTE
QJSGU6UERJSIY5TKMBUIIYDOMBYGQ7UINRQGBBTCMJSGAYDAMBQGAYD
MMBQGAYDAMBVGA7EEMBTGA7DAQJQGAYDSMBQGAYDAMBXIQ7DSMCBG
Q6TMNBSINBECRKFGEZDAMBQGAYDAMBWGAYDAMBQGAYDKMBQGAYDAM
BWG5ZEKNZUG55DONBRG5YTMNJRHBCEGNZSIU7DSRCGII5EIQRQHAYECM
BWGZDEMMBSIZDEKQZVGUYDIQRQG5YDQNRUGJBUEQKFIUYTEMBQGAYD
AMBQGYYDAMBQGAYDKMBUIIYDCMBSGE7DAMBQIEYDAMBZGAYDAMBQGA
7DINBZGBATINZXIU5EEQRVIY7DCMRQGAYDAMBQGA6DAMBQGAYDAMBVGA
YDAMBQGAYDAMBQGAYDAMBQGEYDAMRQGAYDAMBQGAYDAMBQGAYDA
MBWGYZEKNZUG55DONBVGA7EEMBRGAZDCNBQGAYECMBQGA5TAMBQGA
YDANJTGQ5TAQJUG4BTQNZUIFAUCNJRGIYDAMBQGAYDANRQGAYDAMBQG
A7TAMBQGAYDAMBQGAYDAMBQGAYDCMBQGIYDAMBQGAYDANBVGAYDAM
BQGA6EGMSFG57DOOBXGQ7TANCCGAYTAMRRGQYDAMCBGAYDAOJQGAY
DAMBQGM7DMMRQIE7DONBVIQ6DAMCDGEYTEMBQGAYDAMBQGYYDAMBQ
GAYDANJQGAYDAMBQGAYDAMBQGAYDAMBRGAYDEMBQGAYDAMBQHBATA
MBQGAYDANRRGJCTONBXHA6TINJQGRBDAMJQGIYTIMBQGBATAMBQHEYD
AMBQGAYDORBUHEYECNBXGY7DEQ7CIFCUKMJSGAYDAMBQGAYDMMBQG
AYDAMBQGUYDAMBQGAYDAMBQGAYDAMBQGAYTAMBSGAYDAMBQGAYEG
RRQGAYDAMBQGY6TERJXGQ6TQNZUGUYDIQRQGUYDMMBQGAYDAMBQGA
YDIMBQGA7DAMCDIMYDAMBQGAYDCNBQGEYDAMBQGAYDAMA=

Observe that the characters include uppercase letters and numbers 4-7, and determine that it is base32 encrypted. The decrypted result is as follows:

# The first group of file header + file data + data description (69 bytes in total)
504B0304 0A00 0900 0000 4>49 0A>7 7\xe58BB5F4 12000000 06000000 0500 0000 6<2E74=874
1=1651:DC72E49DFB8DB31631B?DAEC>2F90
?04B0=08 7E:BB5F> 12000000 0<000000
# The first group of file header + file data + data description (69 bytes in total)
504B0304 0A00 0;00 0000 534;0A47 \x03874AAA5 12000000 06000000 0?000000 6C2E7478=4
171<518DC72E4;DFB8DB261BAA24C6A6CB3B
50>B070: C874AAA5 12000000 06000000
# The third group of file header + file data + data description (69 bytes in total)
?04B0304 0A00 0900 0000 3462 0A47 45D600C1 12000000 0<000000 0500 0000 <12E7~7874
1716518DC7rE49DFB8DB>14A032A25=BE2F;
504B0708 4?D600C1 12000000 06000000
# The fourth group file header + file data + data description (69 bytes in total)
50>B030> 0A00 0900 0000 7D>9 0A4= 642CBAEE 12000000 06000000 05000000 67rE747z74
17q6518DC72E>9DFB:DB080A066FF02FFEC5
504B07p8 642CBAEE 12000000 06000000
# The first set of core directory headers (51 bytes in total)
504B0102 1>00 0A00 0900 0000 >449 0A47 7E:BB5F> 12000000 0<000000 0500 0000 0000 0000 0100 20000000 00000000 662E747z74
# The second set of core directory headers (51 bytes in total)
50>B0102 1400 0A00 0;00 0000 534; 0A47 \x03874AAA5 12000000 06000000 0?00 0000 0000 0000 0100 20000000 45000000 <C2E7~7874
# The third group of core directory headers (51 bytes in total)
?04B0102 1400 0A00 0900 0000 3>62 0A>7 45D<00C1 12000000 06000000 0500 0000 0000 0000 0100 20000000 8A000000 612E7478=4
# The fourth group of core directory headers (51 bytes in total)
504B0102 1400 0A00 0900 0000 7D49 0A47 6>2C\xe2AEE 12000000 06000000 0500 0000 0000 0000 0100 20000000 CF000000 6=2E74=874
#End of core directory record
504B0506 0000 0000 0400 0>00 CC000000 14010000 0000

According to the equality of the corresponding data, we get ?:5, =:7, >:4, ::8, <:6, ;:9, \xe5:E, \x03:C, \xe2:B, ~:4, r :2, z:8, p:0, q:1. After replacement, the binary code of the zip compressed file is obtained as follows:

# The first group of file header + file data + data description (69 bytes in total)
504B0304 0A00 0900 0000 4449 0A47 7E8BB5F4 12000000 06000000 0500 0000 662E747874
1716518DC72E49DFB8DB31631B5DAEC42F90
504B0708 7E8BB5F4 12000000 06000000
# The first group of file header + file data + data description (69 bytes in total)
504B0304 0A00 0900 0000 53490A47 C874AAA5 12000000 06000000 05000000 6C2E747874
1716518DC72E49DFB8DB261BAA24C6A6CB3B
504B0708 C874AAA5 12000000 06000000
# The third group of file header + file data + data description (69 bytes in total)
504B0304 0A00 0900 0000 3462 0A47 45D600C1 12000000 06000000 0500 0000 612E747874
1716518DC72E49DFB8DB414A032A257BE2F9
504B0708 45D600C1 12000000 06000000
# The fourth group file header + file data + data description (69 bytes in total)
504B0304 0A00 0900 0000 7D49 0A47 642CBAEE 12000000 06000000 05000000 672E747874
1716518DC72E49DFB8DB080A066FF02FFEC5
504B0708 642CBAEE 12000000 06000000
# The first set of core directory headers (51 bytes in total)
504B0102 1400 0A00 0900 0000 4449 0A47 7E8BB5F4 12000000 06000000 0500 0000 0000 0000 0100 20000000 00000000 662E747874
# The second set of core directory headers (51 bytes in total)
504B0102 1400 0A00 0900 0000 5349 0A47 C874AAA5 12000000 06000000 0500 0000 0000 0000 0100 20000000 45000000 6C2E747874
# The third group of core directory headers (51 bytes in total)
504B0102 1400 0A00 0900 0000 3462 0A47 45D600C1 12000000 06000000 0500 0000 0000 0000 0100 20000000 8A000000 612E747874
# The fourth group of core directory headers (51 bytes in total)
504B0102 1400 0A00 0900 0000 7D49 0A47 642CBAEE 12000000 06000000 0500 0000 0000 0000 0100 20000000 CF000000 672E747874
#End of core directory record
504B0506 0000 0000 0400 0400 CC000000 14010000 0000

According to the general bit mark 0900, it can be seen that the zip file is an encrypted compressed file and cannot be decompressed directly. It can be seen from the uncompressed size 06000000 that each file has 6 characters, and the CRC32 check code is 7E8BB5F4, C874AAA5, 45D600C1, 642CBAEE (after conversion to f4b58b7e, a5aa74c8, c100d645, eeba2c64). The crc32 collision method can be used to solve 4 Original text, source code is as follows:

import binascii
import string

def crack_crc():
    print('-------------Start Crack CRC-------------')
    # This example has multiple explosions for CRC values
    crc_list = [0xf4b58b7e, 0xa5aa74c8, 0xc100d645, 0xeeba2c64]# CRC32 value list of the file, pay attention to the order
    comment = ''
    chars = string.printable
    for crc_value in crc_list:
        for char1 in chars:
            char_crc = binascii.crc32(char1.encode())#Get the CRC32 value of the traversed characters
            calc_crc = char_crc & amp; 0xffffffff# Perform an AND operation on the CRC32 value of the obtained character and 0xffffffff
            if calc_crc == crc_value:# Match the CRC32 value of each character with the CRC32 value of each file
                print('[ + ] {}: {}'.format(hex(crc_value),char1))
                comment + = char1
    print('-----------CRC Crack Completed-----------')
    print('Result: {}'.format(comment))

if __name__ == '__main__':
    crack_crc()

3.5 Compressed package steganography

For example, given a bunch of characters or numbers, carefully observe whether it is a certain base system, decode it into hexadecimal, observe whether the file header is a compressed package or other formats, modify the suffix name and decompress it to get the flag

3.6 Hide compressed packages in files

It is most common in CTF compressed package steganography and is mostly used to hide a compressed package in a file.
Principle: Taking jpg format as an example, a complete JPG starts with FF D8 and ends with FF D9. The image browser will ignore the content after FF D9, so other files can be added after the JPG file.
Use: formost, dd and other tools to separate them

IIII. Conclusion

Today we shared the techniques for processing compressed packages. There are also various types of compressed package questions. The important content is the understanding of the header of the compressed package, the compressed package algorithm, the CRC value, and the use of the ARCHPR cracking tool (of course there are many tools you can use ), usually the compressed package is also used in conjunction with other skills to create questions. The processing efficiency of compressed packages is often reflected in the difference in information collection capabilities. I think the most important thing about problem-solving compressed packages is the ability to obtain key information. Many CTF questions will hide key information in compressed packages. Mastering compressed package processing skills can help contestants quickly find and extract these key information, thereby solving problems faster. Moreover, the key information of the compressed package will be told to the players in various ways (either directly or indirectly) on the questions. Players need to pay attention to this information, which is often the key to solving the questions.