[Linux in progress] Disk file structure

Disk

In the last article, we mentioned that files are stored on disks. In this article, let’s learn about the structure of disks! ! !

The concept of disk:

?What is a disk?

Disk refers to a memory that uses magnetic recording technology to store data.

Disk is the main storage medium of computer. It can store a large amount of binary data and maintain data without loss even after power outage. The disk used in early computers was a floppy disk (Floppy Disk, referred to as floppy disk). The disk commonly used today is a hard disk ([Hard disk](https://baike.baidu.com/item/Hard disk/2806058?fromModule=lemma_inlink), referred to as harddisk).

Basic structure of disk:

The disk is the only mechanical device on our computer. Currently, our laptops may no longer use disks, but solid-state drives (SSD). Relatively speaking, it is faster and more efficient to use. Solid state drives are another storage solution, which are very different from disk storage, and the unit price is much higher than that of disks. Generally, a 500G solid state is basically two to three hundred yuan more expensive than an equivalent disk.

Platter:
A magnetic disk usually consists of one or more platters, which are usually thin, round sheets made of materials such as metal or glass. A platter has two sides, front and back.
magnetic head:
The magnetic head is responsible for reading and writing data. Each sector has a magnetic head, which floats above the surface of the disk and reads and writes data on the surface of the disk through a small current.
Track (cylinder):
The sector surface is divided into a number of concentric circles, each circle is called a track. When data is written or read, the heads move over specific tracks.
Sector:
Tracks are further divided into sectors, each of which stores a certain amount of data. The basic unit of storage in a disk, usually 512 bytes or 4 KB.

CHS addressing mode:

?Through the above introduction, we roughly know the structure of the disk. So now we are thinking about a question, how to determine a sector on a disk? The answer is very simple, we first determine a disk surface, and the disk surface can be determined based on the magnetic head. Then locate the sector according to the track, that is, the cylinder (cylinder), and finally find the sector (cylinder) on a certain track, and you can locate a certain sector. This disk addressing method is chs.

So to find a sector in the future, just

Magnetic track (cylinder): cylinder
Head: head
sector: sector

This method of locating sectors is called CHS locating method.

Logical abstract structure of disk:

Through the above we know the disk addressing and positioning method chs, but this is a physical addressing method, and the operating system is a software layer. This addressing method is not suitable for the operating system. How does the operating system address and manage data on the disk? Let’s take a look at the picture below first.

Analogous to a tape, it has a circular structure when rolled up and a linear structure when pulled apart. So we can think of disk platters as linear structures. From the perspective of the OS, the disk is considered to be a linear structure. To access a certain sector, you only need to locate the array subscript. That is to say, knowing the subscript of this sector means locating a sector. Within the operating system, we call this address an LBA (Logic Block Address) address! To write to a physical disk, we need to convert the LBA address into the CHS address, the three-dimensional address of the corresponding disk. In summary, the address of the OS is the LBA address, and the corresponding disk is the CHS address.

Because the OS performs IO in units of 4KB, the data block read by the OS must include 8 sectors. From the perspective of the OS, the sectors may not even matter.

It only needs to be accessed like a conventional computer: starting address + offset, that is, getting the address (subscript) of the first sector of the data block + 4KB (type of block), that is, the entire data block that can be accessed . **

File system:

After learning the above knowledge, we know that the OS abstracts the disk into a large array for management by describing it first and then organizing it.

The specific management method is what we will explain next.

Since the array abstracted from the disk is too large, the first step is to divide it into several areas. Each area is managed in the same way, so managing one area is equivalent to managing the entire disk. (Similar to begin and end for subscript division)

Although the disk has been partitioned once, the size of each area is still very large, and we need to group it again.

File system group management structure:

From this, the task of managing each district is simplified to managing each group. As long as the management of one group is realized, the management of other groups can be completed by copying and pasting, and then the management of the entire district is completed. Managing each district well is equivalent to managing The whole plate is ready.

There will be a Boot Block in each area, also known as the startup block. When booting, the address of the OS image will be read through it to find the operating system. If this area is damaged, it will directly affect the startup of the operating system.

Block Group: The ext2 file system will be divided into several Block Groups according to the size of the partition. And each Block Group has its own

The same structural composition. Examples of government management of various districts

Super Block: Stores the structural information of the file system itself. The recorded information mainly includes: the total amount of bolck and inode,

The number of unused blocks and inodes, the size of a block and inode, the time of the last mount, the time of the last data written

time, the time when the disk was last checked, and other file system related information. Super Block’s information is destroyed, it can be said that the entire

The file system structure is destroyed

GDT, Group Descriptor Table: Block group descriptor, describing block group attribute information. Interested students can learn more about it

Block Bitmap: Block Bitmap records which data block in the Data Block has been occupied and which data block has not

is occupied

inode bitmap: Each bit indicates whether an inode is free and available.

i-node table: stores file attributes such as file size, owner, last modification time, etc.

Data area: stores file content

We often say that in file = content + attributes, content and attributes are stored separately in Linux.

Generally speaking, the set of all attributes inside a file is the inode node (128 bytes), and a file corresponds to an inode.

There will be a large number of files in a partition, so there will be a large number of inodes. Therefore, it is necessary to manage all inodes in the group, that is, inode Table.

Storage attributes:

Each inode has its own corresponding number, which also belongs to the attribute id of the corresponding file. We can view the inode number of the file through ls -i.

ls -i

In subsequent accesses, the OS also searches for files or reads content based on the inode number.

Content storage:

After saving the attributes, the next thing to consider is how to store the file content. We save file content through data blocks, so a valid file requires at least 1 data block to save the content.

The data block is in the Data Block, so how do we locate the data block corresponding to the file?

In fact, the index of the data block corresponding to the current file will be stored inside the inode, and then positioned in the Data Block. It can be roughly understood in this way.

struct inode
{<!-- -->
    int number;
    ...//Other file attributes
    int datablocks[NUM];
};

In-depth understanding of file operations:

How to understand inode:

**Only the inode number is recognized in the Linux system. The file name does not exist in the inode of the file. **The file name is provided to the user. How should we understand this relationship?

After creating a directory file, we can observe that the directory file also has its own inode number. So what data is stored in the directory?

In fact, what is stored in the data block of the directory is the mapping relationship between the file name and the file inode number in the directory, and the two are the key values of each other.

Therefore, any file should be inside a directory.

At the same time, inode can be used to determine grouping. Inode number is only valid in one partition and cannot cross partitions. (Starting position of group + position of bitmap)

Create a new file:

Stored properties
The kernel first finds a free i-node (here 263466). The kernel records file information into it.
Storing data
The file needs to be stored in three disk blocks, and the kernel finds three free blocks: 300, 500, and 800. The first block of data in the kernel buffer
Copy to 300, copy the next block to 500, and so on.
record allocation
The file contents are stored in order 300, 500, 800. The kernel records the above block list in the disk distribution area on the inode.
Add filename to directory
The new file name is abc. How does Linux record this file in the current directory? The kernel adds entry (263466, abc) to the directory file
pieces. The correspondence between the file name and the inode connects the file name to the file’s content and attributes.

Delete a file:

To delete a file, you only need to modify two bitmaps to free up the space, and it will be overwritten and written directly next time

File access:

When we access the file:

First, find the inode number corresponding to the input file name in the current directory.
A directory must belong to a partition. Find the corresponding group in the partition based on the number, and find the inode of the file in the inode table of the group.
By associating the inode with the corresponding Data Block, the relevant data is found, and other operations are performed according to the command.

How to store large files

If we directly use the array inside the inode to directly index the content in the Data Block, assuming that an array can store NUM content, does it mean that we can only store files with a maximum size of NUM * 4KB?

The answer is no. We can make the content of the data block pointed to not the direct data, but the numbers of other data blocks, thereby expanding the storage size of the file.

This indexing method is called a secondary index.