note-Hundreds of Linux Greenhouse Commands 2 Files and Disks

1. file file information

file can identify the type and encoding format of the file.

# Syntax
file file name
# output
File name: file type and encoding format

-b do not output filenames
-i MIME type of the output file
MIME, Multipurpose Internet Mail Extensions type.
-F Change the separator between the file name and file information
It is possible to avoid colon character collisions during text analysis.
-L View the information of the target file of the soft link
The direct query of the file command will display the information of the soft link file itself;
With the -L option, information about the target file pointed to by the soft link will be displayed.
-f read filenames from a text file
When you need to view a large number of file information, save the file name in a text, use file -f text file to specify the text file, and file will view the information of the files one by one.
-z View information about internal files of gz files containing only one file

2. lnCreate a link

Soft link: The content of a soft link file is the path and name of another file. When opening it, the system will find and open the linked file.
Hard link: The hard link file has its own inode node and name, and the inode will point to the data block where the file content is located. A hard link increments the reference count of the contents of the file it points to. A file modification will take effect for all hard link files. When one of the files is deleted, the other file is not affected, but the reference count of the data block is decremented by 1. The system clears this data block only when the reference count is 0.

build link

# Create a hard link
ln source file name hard link file name
# View the inode of the file
ls -i filename

# Create a soft link
ln -s source file name soft link file name

The initial letter of the permission of the soft link file is l.

2 limitations of hard links

Cannot span filesystems.
Ordinary users are not allowed to make hard links to directories.

Linking to the directory may introduce circular links in the directory, resulting in an infinite loop when traversing the directory. The operating system can recognize soft links when traversing directories, and can then take steps to stop the traversal. For hard links, due to algorithm limitations, it is temporarily impossible to prevent this infinite loop.

# View file information, the second column is the number of hard links of the file
ls -l filename

-n Treat the soft link of the folder as the soft link of the file
When defining a new soft link to an existing soft link name, if the link is a file, an error will occur because the name already exists. If the link is a folder, an infinite loop file will be generated under the soft link folder.
Using the -n option, you can also report an error when you find a soft link that defines a folder repeatedly.

When repeated definition errors are reported, you can use the -f option to forcefully change the target of the soft link.

3. find find files

The functions of directory search and file location are provided by the findutils package. findutils contains four useful commands: find, xargs, locate (quickly locate file names), updatedb (update file name database).

find syntax find [path...] -name [pattern]

-type specifies the type of search object
d folder; f ordinary file; l symbolic link file; b block device;
c character device; p pipe file; s socket socket.
-regex use regular expression matching
-user by owner and -group by group
-perm search by permission
-exec Execute the search results as parameters
-exec can execute a specific shell command for each object found by find.
You can use {} in the command to replace the result found by find.
Use \; as the sign of the end of the command. Escaping must be used.
Search files by time
There are three types of time, access time (a), modification time (m), and state change time?.

how long ago and how long ago

# Files that have been modified within n minutes
find -mmin -n
# Files that were modified n minutes ago
find -m min + n

# Files with state changes
find -cmin ±n

# The file being accessed
find -amin ±n

# The unit of n changes to days
find -mtime ±n
find -ctime ±n
find -atime ±n

Time relative to a file:

# Find the time of modification/access/state change that is closer to the corresponding time of the specified file
find -newer file
find -anewer file
find -cnewer file

# Find the X time that is closer to the Y time of the file
find -newerXY file

#newerXY
# Find files whose access time is closer than the modification time of file
find -neweram file
# Find files whose access time is closer to '2022-12-01 10:00:00'
find -newerat '2022-12-01 10:00:00'

search large files

# Find files larger than 40M in the directory and its subdirectories
find -size + 40M
# Find files smaller than 40M
find -size -40M
# Find files equal to 40M
find -size 40M

Other units supported by -size:
b: 512-byte data block; c: bytes; w: two-byte words (words);
k: KB; M: MB; G: GB.

-maxdepth specifies the folder depth to search
Logical operations of find expressions
Each condition followed by find is an expression, and logical operations can be performed between expressions.
$ expr $ Enclose the expression in parentheses to increase the priority of processing; the parentheses must be escaped when used;
The !expr expression is reversed, and the file that does not satisfy expr;
expr1 expr2 default state, expr1 and expr2;
expr1 -a expr2expr1 and expr2;
expr1 -o expr2expr1 or expr2;
expr1, expr2 Both expressions will be judged, and the result is always the result of expr2;

4. The regularity of find

Types of regular expressions
The syntax of regular expressions is not uniform enough. The find command can use the -regextype option to specify the type of regular expression used.

Optional parameters are emacs, posix-awk, posix-basic, posix-egrep, posix-extended.
Find uses the emacs type by default.

in emacs syntax

., *, + ,?, [0-9], [a-z], ^, $ can all be supported normally;
Or the relationship needs to use escape \|;
The grouping function needs to use escaped parentheses , where \\
represents the grouping identifier;
Interval matching is not supported;

5. du disk usage

du focuses on displaying the disk usage of files and folders, and df focuses on the disk usage at the file system level.

Common options for du
-h –human-readable Displays disk footprint units in human-readable form;
-s –summarize Calculate the disk usage for each given parameter;
-c –total sum the displayed results;
-d –max-depth Controls the depth of folder nesting; when the value is 0, the result is the same as the -s option.
-a –all displays the disk usage of the folder and its files;
–exclude=pattern Exclude some options according to the regular pattern;
Use sort to sort

du -sh *| sort -hr

sort’s -h option intelligently compares by unit. The -n option will only compare according to the value, and does not recognize the unit.

unit of du
The default unit of du is KB, which is 1024bytes. This unit is affected by the following options in turn:

If the block size is set using the --block-size option, the unit of du is this size;
If the environment variable DU_BLOCK_SIZE, BLOCK_SIZE or BLOCKSIZE is set, this size is the unit of du;
If the environment variable POSIXLY_CORRECT is enabled, the unit of du is 512bytes.
The priority enabled by the above options is –block-size > DU_BLOCK_SIZE > BLOCK_SIZE > BLOCKSIZE > POSIXLY_CORRECT > default KB.

Differences displayed by du and ls
du shows the disk usage, and ls shows the file size.

The disk is divided into data blocks according to a fixed size, usually each data block is 4KB in size.
Most file systems stipulate that a data block can store at most one file content, and when it is not full, the remaining space will not be used by other files; large files can be stored in multiple data blocks.
Therefore, generally the file size displayed by ls will be smaller than that displayed by du. Unless it is a sparse file, the file contains holes that do not take up disk space.

6. gzip compression

Compression can be used to improve data transmission efficiency, reduce transmission bandwidth, and manage backup data.

gzip compressed file

# Compressed file (will change the file to a file with a .gz suffix)
gzip source file 1 source file 2
# unzip files
gzip -d Compress file 1 Compress file 2
# Keep source files when compressing
gzip -c source_file>compress_file.gz

tar package
tar can package multiple files and folders together.
gzip can only compress and decompress ordinary files, and does not support folders and symbolic links, nor can it package multiple files together. Therefore, it is generally used together with the tar command in actual use.

# pack
tar -czvf compressed file name file1 file2
# unpacking
tar -xzvf archive file

# Unzip only the specified files
tar -xzvf compressed package file specified file

The meaning of the common options of the tar command:
-x for unpacking, -c for packaging;
-z Compress or decompress using gzip;
-j Use bzip2 to compress or decompress;
-v Display the unpacked or packed files;
-f specifies the file to be unpacked or packaged, which needs to be placed at the end;

other options:
-t lists the contents of the package file (no decompression required);

tar can automatically determine the compression method according to the file suffix, so you don’t need to specify the compression method when decompressing.

Compression Strength and Compression Velocity
gzip provides 9 compression levels, from 1 to 9, the compression is getting slower and the compression strength is getting higher and higher. The default compression level is 6.

gzip -1 source file

7. bzip2 compression

The effect of bzip2 is the same as that of gzip, but compared with gzip, the stability and effect of compression are better.

# compression
bzip2 source file 1 source file 2
# unzip
bzip2 -d compress file 1 compress file 2
bunzip2 compressed file 1 compressed file 2

bzip2 and bunzip2 are one program. This program performs compression work by default, and if it detects that there is unzip or UNZIP in the running command, it will perform decompression work.

8. zip compression

zip preserves the source files when compressing.

# compress files and folders
zip -r compressed file name source file or source folder
# Unzip to the specified folder
unzip -d target folder address compressed file

Common options
-v View the files in the archive without decompressing.
-t Verify the integrity of compressed files.
-d Delete a file in the compressed file.

# Delete the files in the compressed file
zip archive -d file

9. dd makes a black hole

dd is used to read the content in the device and file, and copy it to the specified location intact.
When using the dd command to read /dev/null files, empty files are created.

# Syntax
dd if=input file or device of=output file or device

# backup disk vda
dd if=/dev/vda of=/app/vda.img
# restore backup to vdb
dd if=/app/vda.img of=/dev/vdb

Compress when backing up

# Backup
dd if=/dev/vda | gzip > /app/vda.img.gz
dd if=/dev/vda | bzip2 > /app/vda.img.bz2
# recover
gzip -dc /app/vda.img.gz | dd of=/dev/sdb
bzip2 -dc /app/vda.img.bz2 | dd of=/dev/sdb

The dd command does not write if=, which means reading from standard input.

Backup partition, memory, floppy disk
There is no difference between the command and the backup disk

# partition, memory
dd if=/dev/vda2 of=/app/vda2.img
dd if=/dev/mem of=/app/mem.img
# Floppy, CD
dd if=/dev/fd0 of=/app/fd0.img
dd if=/dev/cdrom of=/app/cd.img

other options
bs=N: Set the data block size for a single read-in or a single output. You can also use ibs= and obs= to set input and output respectively.
count=N: Indicates that a total of N data blocks need to be copied.

# MBR backup of the disk
dd if=/dev/vda of=/app/vda_mbr.img count=1 bs=512
# MBR writeback
dd if=/app/vda_mbr.img of=/dev/vda

Master Boot Record (MBR) The master boot record of the hard disk, if the MBR is damaged, the partition table will also be damaged, and the system will not be able to boot normally. The 512 bytes of the first sector of a disk store the MBR information of the disk.

Test disk with /dev/null and /dev/zero
/dev/null: Null device, any written data will be discarded.
/dev/zero: It can generate an uninterrupted null stream, which is used to write null data to the device or file, and is generally used to initialize the device or file.

# test write performance
dd if=/dev/zero bs=1024 count=1000000 of=/app/1GB.file
# Test read performance
dd if=/app/1GB.file bs=64k | dd of=/dev/null

formatted using /dev/urandom
You can use /dev/urandom to generate random data, write it to disk, and completely overwrite the original data.

dd if=/dev/urandom of=/dev/sda