Linux file system fsck disk repair

Linux File System

Everything is a file

Linux file system is an endogenous software that manages files stored on storage devices. Linux supports several types of file systems for storing applications, data files, configuration files, etc. on the hard drive.
File system type: Linux supports a variety of file systems, such as ext4, xfs, btrfs, etc. Each file system implements a virtual directory structure on the storage device with specific functionality and performance.
The Linux file system allocates two data structures for each file:

  • Index nodes and directory entries are mainly used to record the meta-information and directory hierarchy of files. Index nodes, also known as inodes, are used to record meta-information of files, such as inode number, file size, access permissions, creation time, modification time, data location on disk, etc. The index node is the unique identifier of the file. They have a one-to-one correspondence and are also stored on the hard disk, so the index node also takes up disk space.
  • Directory entries, also known as dentry, are used to record file names, index node pointers, and hierarchical relationships with other directory entries. When multiple directory entries are associated, a directory structure will be formed, but it is different from the index node in that the directory entry is a data structure maintained by the kernel (the directory is a file, which is stored persistently on the disk) and is not stored not on disk, but cached in memory.

If the query directory is frequently read from the disk, the efficiency will be very low, so the kernel will cache the directory that has been read in the memory using the directory entry data structure. The next time the same directory is read again, it only needs to be read from the memory. Yes, the efficiency of the file system is greatly improved.
The file system combines multiple sectors into a logical block. The smallest unit for each read and write is a logical block (data block). The logical block size in Linux is 4KB, which means 8 sectors can be read and written at one time ( 512B per sector), which will greatly improve the efficiency of disk reading and writing.

Virtual File System (VFS)

There are many types of file systems, and theoperating system hopes to provide a unified interface to users, so an intermediate layer is introduced between the user layer and the file system layer. This intermediate layer is called the Virtual File System (Virtual File System, VFS).
VFS defines a set of data structures and standard interfaces supported by all file systems, so that programmers do not need to understand how the file system works, but only need to understand the unified interface provided by VFS.

The file systems supported by Linux are divided into three categories according to different storage locations:

  • Disk file system, which stores data directly on the disk, such as Ext 2/3/4, XFS, etc. are all such file systems.
  • Memory file system. The data of this type of file system is not stored on the hard disk, but takes up memory space. The /proc and /sys file systems we often use fall into this category. Reading and writing such files is actually Read and write related data in the kernel.
  • Network file system, file system used to access other computer host data, such as NFS, SMB, etc.

The file system must first be mounted to a certain directory before it can be used normally. For example, when the Linux system is started, the file system will be mounted to the root directory.

File usage

fd = open(name, flag); #Open the file. The path name and file name of the file and return the file descriptor.
...
write(fd,...); # Write data using file descriptors to operate files
...
close(fd); #Close the file to avoid resource leakage

process:
After opening a file, the operating system will track all files opened by the process. The so-called tracking means that the operating system maintains an open file table for each process. Each item in the file table represents a “file description.” symbol”, so the file descriptor is the identifier of the open file.

The operating system maintains the status and information of open files in the open file table of each process:

  • File pointer: The system tracks the last read and write position as the current file position pointer. This pointer is unique to a process that opens the file;
  • File open counter: When a file is closed, the operating system must reuse its open file table entries, otherwise there will not be enough space in the table. Because multiple processes may open the same file, the system must wait for the last process to close the file before deleting the open file entry. This counter tracks the number of opens and closes. When the count reaches 0, the system closes the file and deletes the entry. ;
  • File disk location: The vast majority of file operations require the system to modify file data. This information is stored in memory to avoid reading from disk for each operation;
  • Access permissions: Each process needs to have an access mode (create, read-only, read-write, add, etc.) to open a file. This information is saved in the open file table of the process so that the operating system can allow or deny subsequent I/O. ask;

The process of reading and writing files:

  • When a user process reads 1 byte of data from a file, the file system needs to obtain the data block where the byte is located, and then return the data part corresponding to the data block required by the user process.
  • When the user process writes 1 byte of data into the file, the file system finds the location of the data block that needs to be written, then modifies the corresponding part of the data block, and finally writes the data block back to the disk.
    The basic operating unit of a file system is the data block.

File Storage

Continuous storage

The “starting block position” and “length” need to be specified in the file header. With these two pieces of information, it can be a good indication that the file storage method is a continuous disk space. The file header mentioned here is similar to the Linux inode.

Although the continuous space storage method has high reading and writing efficiency, it has the disadvantages of “disk space fragmentation” and “the file length is not easy to expand”. **

Please add an image description

Non-contiguous storage

Implicit linked list: If stored in this way, the file header should contain the locations of the “first block” and “last block”, and a pointer space should be reserved in each data block. to store the location of the next data block.
Disadvantages
1. The data block cannot be accessed directly, the file can only be accessed sequentially through pointers, and the data block pointer consumes a certain amount of storage space.
2. The stability of implicit link allocation is poor. During the operation of the system, the pointers in the linked list are lost or damaged due to software or hardware errors, which may lead to the loss of file data.
Please add an image description
Display linked list: Explicitly store the pointers used to link each data block of the file in a linked list in the memory. There is only one link table in the entire disk, and each table entry stores Link pointer, pointing to the next data block number. Significantly improves retrieval speed and greatly reduces the number of disk accesses.
Disadvantages:
1. The entire table is stored in memory, so it is not suitable for large disks.

File index

Three basic methods of file indexing:
Sequential allocation:
Please add a picture description
Linked list allocation:
Please add a picture description
Index allocation:
Please add a picture description
Contrast:
Please add an image description
It combines the advantages of three basic indexing methods:
, Please add a picture description

Directory storage

The blocks of the directory file store the file information one by one in the directory.
In the block of a directory file, the simplest saving format is a list, which lists the file information (such as file name, file inode, file type, etc.) of each file in the directory in the table. Each item in the list represents the file name and corresponding inode of the file in the directory. Through this inode, the real file can be found.
Please add a picture description
In order to improve efficiency, change the format of the saved directory to a hash table, perform hash calculation on the file name, and save the hash value. If we want to find the file name under a directory, we can use the name Get the hash. If the hashes match, it means that the file’s information is in the corresponding block.
The ext file system of the Linux system uses a hash table to save the contents of the directory. The advantage of this method is that the search is very fast, and the insertion and deletion are relatively simple. However, some preparatory measures are required to avoid hash conflicts.

Hard links and soft links

Hard links are “index nodes” in multiple directory entries that point to a file, that is, to the same inode. However, inodes cannot span file systems. Each file system has its own inode. Data structures and lists, so hard links are not available across file systems. Since multiple directory entries point to one inode, the system will completely delete the file only when all hard links and source files of the file are deleted Please add a picture description
Soft link is equivalent to re-creating a file. This file has an independent inode, but the content of this file is the path of another file. Therefore, when accessing a soft link, it is actually equivalent to accessing another file. files, so soft links can cross file systems. Even if the target file is deleted, the linked file is still there, but the pointed file cannot be found.
Please add a picture description

File system structure

When a user creates a new file, the Linux kernel will find a free and available inode through the inode bitmap and allocate it. When data is to be stored, free blocks are found through the block bitmap and allocated. Calculating a block of 4K can represent 4 * 1024 * 8 = 2^15 free blocks. The total size of these blocks is (4 * 1024 (one data block)) * 2^15 = 2^27 = 128M. (Too small, many files are larger than this).
In the Linux file system, this structure is called a block group. If there are more than N block groups, N large files can be represented.

The following figure shows the structure of the entire Linux Ext2 file system and the contents of the block groups. The file system is composed of a large number of block groups, which are arranged one after another on the hard disk:
Please add a picture description

  • The super block contains important information of the file system, such as the total number of inodes, the total number of blocks, the number of inodes in each block group, the number of blocks in each block group, etc.
  • The block group descriptor contains the status of each block group in the file system, such as the number of free blocks and inodes in the block group. Each block group contains the “group descriptor information of all block groups” in the file system.
  • Data bitmap and inode bitmap are used to indicate whether the corresponding data block or inode is free or in use.
  • The inode list contains all inodes in the block group. The inode is used to save all metadata related to each file and directory in the file system.
  • Data block, containing the useful data of the file.

File system introduction

ext file system

The earliest file system introduced in the Linux operating system is called the extended file system (extended file system, abbreviated as ext). It provides a basic Unix-like file system for Linux: using virtual directories to operate hardware devices and storing data in fixed-length blocks on physical devices.
The ext file system uses a system called inodes to store information about files stored in virtual directories. The inode system creates a separate table (called an inode table) in each physical device to store information for these files. Each file stored in a virtual directory has an entry in the inode table. The extended part of the ext file system name comes from the extra data for each file it tracks, including:

  • file name
  • File size
  • The owner of the file
  • File’s group
  • File access permissions

Linux refers to each inode in the inode table by a unique value (called an inode number) that is assigned by the file system when the file is created. File systems identify files by their inode number rather than their full name and path. A common problem with ext file systems is that when files are written to a physical device, the blocks used to store data can easily become scattered throughout the device (called fragmentation). Fragmentation of data blocks reduces file system performance because it takes longer to find all the blocks for a specific file in the storage device.
Reference: https://brinnatt.com/primary/Chapter-4-linux-ext-filesystem/

ext2 file system

In order to solve the 2GB file size limit of the ext file system, upgrade ext2. The ext2 file system extends the inode table format to hold more information about each file on the system. Added file information: creation time value, modification time value, last access time value, to help system administrators track file access. The maximum allowed file size is increased to 2TB (changed to 32TB in later versions) to accommodate large files commonly found in database servers.
Compared to the ext file system, the ext2 file system reduces fragmentation by allocating disk blocks in groups when saving files. By grouping data blocks, the file system does not need to search the entire physical device for the data block when reading the file.
Every time the file system stores or updates a file, it updates the inode table with new information. The problem is that this operation is not always done in one go and can cause fatal problems to the system.
The system can easily cause file system damage and crash due to various non-standard operations such as human error, abnormal power outage and restart, and in severe cases, hardware damage. To avoid irreversible damage such as file system corruption or even hardware damage, please operate as standardized as possible.
Reference article: https://blog.csdn.net/sinat_37817094/article/details/125716792

ext3 file system (log file system)

  1. File system design principle: The design principle of the Linux file system is based on the consistency and reliability of the file system. File systems need to ensure the integrity, reliability, and availability of data so that it can be accessed and modified when needed.
  2. How the fsck command works: The fsck command checks and repairs the file system. It checks the file system’s data structure, file nodes, directory structure, etc. to ensure the consistency and normality of the file system.

ext4 file system (log file system)

The result of extending the capabilities of the ext3 file system is (you might have guessed it) the ext4 file system. The ext4 file system was officially supported by the Linux kernel in 2008 and is now the default file system used by most popular Linux distributions.
In addition to supporting data compression and encryption, the ext4 file system also supports a feature called extents. Extents allocate space in blocks on the storage device, but only the location of the starting block is saved in the inode table. It saves some space in the inode table by not having to list all the data blocks used to store data in the file.
ext4 also introduces block preallocation technology (block preallocation). If you want to reserve space on a storage device for a file that you know is going to grow in size, the ext4 file system can allocate all the blocks that are needed to the file, not just those that are already used. The ext4 file system fills reserved data blocks with 0s and does not allocate them to other files.

Reiser file system (log file system)

In 2001, Hans Reiser created the first journaling file system for Linux called ReiserFS. The ReiserFS file system only supports write-back log mode – only the index node table data is written to the log file. The ReiserFS file system has therefore become one of the fastest journaling file systems on Linux.
Two interesting features have been introduced into the ReiserFS file system: one is that you can resize the existing file system online; the other is a technology called tail compression (tailpacking), which can fill the data of a file into Empty space in another file’s data blocks. If you must expand an existing file system to accommodate more data, the online file system resize function is very useful.

JFS file system (log file system)

One of the oldest journaled file systems that may still be in use, JFS (Journaled File System) was developed by IBM in 1990 for its Unix derivative AIX. However, it was not until version 2 that it was ported to the Linux environment.
IBM officially calls version 2 of the JFS file system JFS2, but most Linux systems refer to it simply as JFS.
The JFS file system uses an ordered log method, that is, only the index node table data is saved in the log, and it is not deleted until the actual file data is written to the storage device. This approach is a compromise between the speed of ReiserFS and the completeness of the data mode logging approach.
The JFS file system uses extent-based file allocation, in which a set of blocks is allocated for each file written to the storage device. This reduces fragmentation on the storage device.
Except for use on IBM Linux, the JFS file system has not become popular, but you may encounter it in your daily life with Linux.

XFS file system (log file system)

The XFS journaling file system is another file system originally used in commercial Unix systems and now making its way into the Linux world. Silicon Graphics Corporation (SGI) originally developed XFS in 1994 for its commercial IRIX Unix system. In 2002, it was released for Linux environments.
The XFS file system uses a write-back mode log, which provides high performance but also introduces certain risks because the actual data is not stored in the log file. The XFS file system also allows online resizing of the file system, which is similar to the ReiserFS file system, except that the XFS file system can only be expanded but not reduced.

ZFS file system (copy-on-write file system)

The COW file system ZFS was developed by Sun in 2005 for the OpenSolaris operating system. It began to be ported to Linux in 2008 and was finally put into use in Linux products in 2012.
ZFS is a stable file system, on par with Resier4, Btrfs and ext4. Its biggest weakness is that it is not licensed under the GPL. The OpenZFS project, launched since 2013, has the potential to change that. However, until it is licensed under the GPL, ZFS may not become the default file system for Linux.

Btrfs file system (copy-on-write file system)

Btrfs file system is a newcomer to COW, also known as B-tree file system. It was developed by Oracle Corporation in 2007. Btrfs builds on many features of Reiser4 with improved reliability. Other developers eventually joined the development process, helping Btrfs quickly become the most popular file system. The reason is due to its stability, ease of use and ability to dynamically adjust the size of the mounted file system. The OpenSUSE Linux distribution uses Btrfs as its default file system. In addition to this, the file system is also present in other Linux distributions (such as RHEL), and Fedora made Btrfs its default file system in 2020.

FSCK

Computers will inevitably experience system abnormalities due to certain system factors or human error (sudden power outage). In this case, it is very easy to cause the file system to collapse, and in severe cases, it may even cause hardware damage.
fsck is a tool for checking and repairing errors in file systems. It works by scanning the file system and finding bad data structures and files, then trying to recover from those errors.
fsck can only repair file system errors at the software level and cannot repair damage at the hardware level. If the file system has been damaged at the hardware level, you may need to use professional data recovery tools to recover the data.
The file system should not be running when using the fsck tool.

Principle of repair

  1. Check the superblock: fsck first reads the file system’s superblock to understand the file system’s size, status, and other information. If the superblock is corrupted, fsck will try to recover it, otherwise it will use the backup superblock.
  2. Check the inode table: fsck checks the metadata (such as file size, access time, modification time, creation time, etc.) of each inode in the inode table and ensures that they match the actual contents of the file system.
  3. Check the block bitmap: fsck checks the block bitmap to ensure that each block is correctly marked as allocated or unallocated. If the block bitmap is wrong, fsck will try to fix it.
  4. Check the directory structure: fsck checks each directory to ensure that each directory entry points to the correct inode and that there are no duplicate or corrupt directory entries. If errors are found, fsck attempts to remove or repair them.
  5. Fix errors: If fsck detects errors, it attempts to fix them automatically. If it cannot be fixed automatically, it reports the errors to the user and asks the user if they want to fix them manually.
    Reference article: https://blog.csdn.net/younger_china/article/details/76348817

How to use

Syntax: fsck.ext4[required parameters][selected parameters][device code]
parameter:

Parameters Comments
-a Non-interactive mode, automatic repair
-c Check whether there are damaged blocks.
-C Anti-narrator> The fsck.ext3 command will hand over the entire execution process to its reverse narrative to facilitate monitoring the program
-C td>
-d Display the command execution process in detail
-f Force Check
-F Before checking the file system, clean the data in the block area of the saving device
-l Add a mark to the damaged blocks listed in the file
-L Clear all damage flags, re-mark
-n Non-interactive mode, put the file to be checked Set the system to read-only
-P Set the inode size that the fsck.ext2 command can handle
-r Interactive mode
-R Ignore directory
-s Sequence check
-S Effect and specify “-s” parameter Similar to
-t to display the timing information of the fsck.ext2 command.
-v Show detailed processing
-y Turn off interactive mode
-b Specify the starting address of the first sector of the partition/Super Block
-B Set the size of each block of the partition
-I Set the file system to be checked and the number of blocks in its inode buffer
-V Display version information

Usage examples

1. fsck -r /dev/sdb1 #Use interactive method to repair sdb1 partition
2. fsck -t ext4 -v /dev/sda1 #Repair ext4 system
3. fsck -s /dev/sdb #Check the integrity of the /dev/sdb file system and report any errors
4. fsck -y /dev/sdb #When fsck detects errors, automatically try to fix them
5. fsck -s / #Check the integrity of the system root partition and report any errors
6. //Force fsck to run when the system starts: Create a file named forcefsck in the root partition of the system, and then restart the system. During the next boot, fsck will be run automatically
7. //Run fsck in rescue mode: Restart the system, hold down the shift key to display the grub menu, select advanced options, and then select fsck
/*
    Restart the system and select Use Live CD on startup.
   In the Live CD, find File Manager or Terminal.
   Use the lsblk command to check the hard disk partition status to determine the file system that needs to be repaired.
   Enter the following command in the terminal, choosing automatic repair or manual repair as needed:
   Automatically repair file system errors: fsck -y /dev/sdb, when fsck detects errors, automatically attempts to repair them
   Manually repair file system errors: fsck -n /dev/sdb, checks the integrity of the file system and reports any errors, but does not perform automatic repair
   If fsck detects errors, it attempts to fix them automatically. If it cannot be fixed automatically, it reports the errors to the user and asks the user if they want to fix them manually
*/
8. fsck -AR -y #Automatically check and repair all installed file systems

Reference article:

  1. https://www.cnblogs.com/forvs/p/16998934.html
  2. https://segmentfault.com/a/1190000023615225#item-2-6 (basically moved, all dry goods)
  3. https://blog.csdn.net/Beginner_G/article/details/117911252