[Linux] inode hard and soft link

Article directory

  • preface
  • 1. File system
    • 1.1 The physical structure of the disk:
    • 1.2 CSH and LBA:
    • 1.3 The basic unit of IO:
    • 1.4 File system structure:
    • 1.5 Understanding inodes:
  • 2. Hard and soft connection
    • 2.1 Number of hard links for directories and files:
    • 2.2 Soft connection:
    • 2.3 Hard link:
    • 2.4 The difference between soft and hard connections:

Foreword

In this chapter, we will learn to understand the physical structure of the disk, understand the disk partition and block, and how to manage the block. Learn to recognize inodes and hard and soft connections. The goal has been determined, and the next step is to move the small bench and prepare to start the lecture…

1. File system

1.1 The physical structure of the disk:

Everything we learned earlier is in memory. But not all files are opened.
A large number of files are lying quietly on the disk. This batch of files is very numerous, messy, and messy.
perspective, removed from memory, perspective, migrated to disk.
The basic file management of the disk, the essential work: it is the same as the boss of the express station!

The disk is the only mechanical device on our computer. At present, on our notebooks, the disk may no longer be used but an SSD (solid state drive). Disks are cheaper, and most of the company’s servers are disk-based servers, and SSDs are not as durable as disks.


Principle of storing data:

The principle of disk storage data is based on the physical process of the interaction between magnetic materials and magnetic fields. On magnetic media such as hard disks and tapes, data is stored and read in the form of magnetic fields.

Magnetic, changing the N/S pole is changing the 0/1.

When we change the N/S pole of a certain location on the disk, it is like changing the data 0/1 stored here.

  • One side, one head.
  • Mechanical + Peripherals = Disk must be slow (compared to CPU, RAM)


The disk storage structure is composed of multiple components, each of which has different functions and roles. The following are the common components and their explanations in the disk storage structure:

  1. Platters: Platters are the main part of the disk storage structure and are usually made of metal or glass materials. They are stacked and rotated by a spindle. Data is stored on the surfaces of the platters, and both sides of each platter can be used for data storage.

  2. Heads: Heads are devices used to read and write data, usually consisting of electromagnetic components. The magnetic head is located above or below the surface of the platter, can float very close to the surface of the platter, and generates a magnetic field through a small electric current, which interacts with the magnetic field on the platter.

  3. Arm: The arm is the supporting structure of the magnetic head, also known as the seek arm. The arm is attached to the magnetic head at one end and to the drive’s position control system at the other end. The arm can move over the edge of the platter to position the head on the assigned track.

  4. Track: A track is a circular area on the platter, which is divided into multiple concentric circles. A certain capacity of data can be stored on each track, and the magnetic head positions it on the designated track through the arm to perform read and write operations.

  5. Cylinder: A cylinder refers to a collection of tracks located at the same radial position on multiple platters. In other words, a cylinder consists of identically numbered tracks on multiple platters that are vertically aligned to form a columnar structure. Each platter has multiple cylinders with the same number of tracks on each cylinder. The use of cylinder numbers can simplify the seek operation of the magnetic head.

  6. Sector: A sector is the smallest unit on a magnetic track and is used to store data. Typically, each sector is 512 bytes or 4KB in size. Reading and writing of data is performed in units of sectors.

  7. Spindle: The spindle is the rotation center axis of the disc, which is driven by a motor to rotate the disc at high speed. The rotational speed of the spindle is usually measured in revolutions per minute (RPM), such as 7,200 RPM or 10,000 RPM.

  8. Drive Controller: A drive controller is a circuit board on a disk drive that controls the operation of the entire disk storage system. It is connected to the computer system and controls the movement of the magnetic head, data reading and writing, etc. according to the instructions of the computer.

These components work together to realize the storage and access process of data on disk. Through the movement of the magnetic head and the rotation of the platter, data can be read and written precisely, enabling efficient data storage and retrieval.

When reading and writing a disk, the head is looking for a certain sector of a certain track on a certain surface:

  • Disk surface (has two sides) has its own corresponding head
  • Track is determined by the radius from the center of the circle
  • Sector is determined by disk rotation

The job of the file system of the operating system is to associate files with their corresponding sectors. With the method mentioned above, you can find every sector!

  • As long as we can find the disk surface, cylinder (track), and sector on the disk, we can find a storage unit!
  • In the same way, we can find all basic units! !

The operation of finding the location of the data is called CHS addressing, and CHS corresponds to the magnetic column, magnetic surface, sector district.

1.2 CSH and LBA:

CHS addressing method:

  • How to determine which positions on the disk face the data is written to?
  • First determine a certain cylinder or track, and then determine which surface it is.
  • The next step is to determine which sector it is in, and then a certain storage unit can be determined.
  • You can find any specified sector.

I believe we have all seen tapes, which are rolled up into one plate and straightened out into a long strip. And our disk can also be regarded as straightening each track, connecting them into a long strip, we can regard it as a large array.

  • At this time, the modification of the disk file can be abstracted into an operation of adding, deleting, checking and modifying an array in the kernel!
  • Manage each small piece well, and manage the whole. The management of the disk can be imagined as the management of the array.

This abstract disk is called the LBA Logical Block Address method.

Through operations such as integer division + modulus, the LBA logical block address can be converted into a CHS address to determine the specific location.

Specific calculation process reference: learning portal

After modifying the data in the array, the operating system calculates the CHS address corresponding to the LBA and hands it to the disk, allowing the disk to modify the data in the specified sector, thus realizing the operation of saving the data to the disk.

1.3 The basic unit of IO:

For an operating system, the basic unit of an IO is usually a block. Usually, the size of a block is equal to the size of a sector. A sector is the smallest unit of disk storage, generally 512 bytes or 4KB.

The basic unit of disk access is sector, which does not mean that the disk must be accessed in units of sectors in the future. We can access in units of multiple sectors.

Why is it usually 4KB?

There are several reasons why the basic unit of IO generally chooses 4KB:

  1. Storage medium characteristics: The sector size of a traditional hard disk (disk) is usually 512 bytes, and a 4KB block contains exactly 8 sectors. Using 4KB as the basic unit of IO can better match the physical organization structure of the hard disk, reduce the seek overhead when reading and writing, and improve the read and write efficiency of storage devices.
  2. Caching effect: A larger block size helps to improve the caching effect of IO operations. When the system does an IO operation, it loads the entire block into memory cache. A larger block size can maximize the use of memory caching capabilities and reduce frequent disk access, thereby improving overall read and write performance.
  3. Block size of the file system: Many file systems use 4KB as the default block size. Choosing an IO unit that is consistent with the block size of the file system can better cooperate with the file system. In this way, additional conversion and management overhead can be avoided, and the efficiency of data reading and writing can be improved.

I personally think the most important point is:

  • Don’t let software (OS) design and hardware (disk) have a strong correlation, in other words, decoupling!
  • The memory should also be received by applying for a 4KB space.

It should be noted that the selection of IO basic units will also be affected by specific application scenarios, hardware limitations, and performance requirements. For certain applications, other block sizes may be used to meet specific needs. Therefore, selecting the optimal size of the IO basic unit requires comprehensive consideration of various factors and actual testing and evaluation.

1.4 File system structure:

As we mentioned above, the disk is regarded as a large array, and the large array is divided into blocks. As long as each block is managed well, the entire disk will be managed in place.

Assuming that the partition has 100GB, we divided it into 20 groups, and then divided into five groups, so in the end, if we want to manage the disk well, we must manage the block group well, that is, finally manage 1GB well.

Boot Block is related to booting:

Generally, when a computer is first started, it first powers on and performs a self-test, and then finds a device Base 10 System on the motherboard. It is hardware, and there is about 500 bytes of storage space in it. The disk device is stored. When it starts up, it must find the computer and where the operating system is, so it must read the Boot Block in a partition after it starts up. This shows that The boot information of a machine, including the partition table, also tells us where the software is in the operating system.
So on the hardware level, when the system starts, the operating system can be found directly by reading this small piece of data, and then the operating system is loaded, commonly known as: booting.

Linux uses a separate storage method for file content and file attributes:

  • The attributes of the file are stable
  • The content of the file is constantly increasing

1.5 Understanding inodes:

The attribute information of files in Linux (such as permissions, owners, sizes, etc.) is stored and managed through inodes:

  • Each file has a unique inode number that identifies the file.
  • The inode is a data structure in the file system, which contains the metadata information of the file.
  • Including file permissions, timestamp, size, etc.
  • Of course there are some files without inodes.

A file name is a recognizable and easy-to-remember name assigned by a user to a file, while an inode is a data structure used within the file system to uniquely identify and manage files.

  • Data blocks: Save the file content in blocks!
  • inode table: In 128 bytes as a unit, save inode attributes!
  • How to calibrate the uniqueness of the file?
    • There is an inode number in the inode attribute!
  • In a partition, an inode is unique.
    • In general, one file, one inode, one file, one inode number !
  • Block Bitmap: identifies whether each block has been used. The bit content is used to indicate whether the corresponding block is occupied. The bit position is used to indicate which data block.
  • inode Bitmap: Use the position of the bit to represent the number of inodes, and use the bit of the corresponding position to indicate whether the inode in the inode table is occupied.
  • GDT(Group Descriptor Table): manages a group in the partition, how many inodes there are, the starting inode number, how many inodes are used, how many blocks are used, how many are left, and the total What is the group size…
  • SB(super block): is the item layer data structure of our file system!
    • Indicates the entire partition, how many blocks there are in total, and how many block groups there are in total.
    • What is the inode usage of each block group, and what is the Date Block usage of each block group.
    • How big is the entire partition, what is the number of the entire partition on the disk, and what kind of file system is the entire partition.
    • What is the type of file system? All attributes are written in Super Block, so it is a top-level data structure.
  • How does an inode (file, attribute) relate to its own content?
    • data block, 4KB, also called to save the number of other blocks!
    • Find other blocks directly in your own block, find the inode of the file, and you can find all the contents of the file.
  • Does the file name count as an attribute of the file? Calculate!
    • However, the file name is not saved in the inode! !
    • Under Linux, the underlying file is actually identified by the inode number, and there is no concept of file name.

Everything is a file under Linux, so is a directory a file?

  • The answer is yes, directories are files too! !
  • file = content (blocks) + attributes (inode)
  • The mapping relationship between file name and inode is stored in the data block (block) of the directory.

Why do you need to have read permission when viewing the file name when you want to read the file name in the directory:

  • Reading a filename is reading the contents of a directory.
  • Read permission is required to read the contents of the directory.
  • This is no different from reading a normal file.

When you want to create a file in a directory, you must have write permission when creating the file:

  • Finally, the file name and inode mapping relationship should be written to the data block of the directory.
  • If there is no write permission, how to write?

Can multiple files with the same name be created in the same directory in Linux? ? Won’t!

  • The filename itself is a thing with a Key value!
  • Finding inodes by file names must be a one-to-one relationship, and there is no problem of duplicate file names.
  • The directory stores the mapping relationship between file names and inodes.

Supplement:

  • A directory is actually a special file that contains one or more data blocks for storing the mapping between file names and corresponding inode numbers.
  • A series of directory entries are stored in each directory data block, and each directory entry consists of a file name and a corresponding inode number. By traversing these directory entries, the system can find the corresponding inode according to the file name, and then operate the file.
  • When the file system needs to create, delete, rename, or move a file, it updates the directory entry in the directory data block to reflect the change between the file name and the inode.
  • Therefore, the data block of the directory carries the mapping relationship between the file name and the inode, and it acts as a bridge connecting the file name and the inode. By searching directory data blocks, the system can quickly locate and operate specific files.

Check the inode of the file by ls -l -i:

When we create a file, what does the operating system do?

  • Find your own directory, find the inode of the directory, and then find the Date Block of the directory.
  • Here is the mapping relationship between the file name and the inode, the uniqueness of the file name in this directory, and the inode number is found by searching according to the file name.
  • Find the block group according to the inode number, and then just set the inode Bitmap corresponding to the file from 1 to 0.
  • Set the Block Bitmap data block corresponding to this file from 1 to 0, and the file deletion is completed at this time.

Delete a file, what does the OS do? ?

  • Linux doesn’t really clear data:
  • When deleting, just set the relevant bitmap structure of the attribute corresponding to the marked file and the data block from 1 to 0, and the deletion is completed.
  • Finally, in the directory where the file is located, the inode mapping relationship corresponding to the file is removed, and the file is deleted at this time.

The working process of ls:

  • ls finds the inode number corresponding to the directory, and finds the inode according to the inode number.
  • There are attributes in the inode, and there are mapping relationships between data blocks and inodes in the attributes.
  • Find the data block, just list the file names in the data block.

ls -l working process:

  • ls -l is to find the data block corresponding to the directory and find the corresponding content.
  • In the content, there is a mapping relationship between the file name and the inode. Take the inode of each file, find its own inode for each file, and read all the attributes.
  • At this time, the file name is spliced to form the information of ls -l.

2. Soft and hard connections

When we first started learning Linux, we had a number of connections that we didn’t talk about:

Number of hard links to files:

The number of hard links to a file refers to the number of hard links to a specific file. In the Linux system, multiple file names can point to the same data block, and these file names are called hard links to files. Hard links are links in the file system that have the same inode number and that refer to the same file content.
Whenever a hard link is created, the hard link count of the file is incremented by 1. Conversely, when a hard link is deleted, the hard link count is decremented by 1. Only when the hard link count is 0, the file is actually deleted and the related storage space is released.
Therefore, the number of hard links of a file indicates how many file names point to the same piece of data. When the hard link count is 1, it means that the file has no other hard links, that is, it is the only file pointing to the data.

2.1 Number of hard links to directories and files:

When we create a directory or file, what we see is: the default number of hard links for directories is 2, and the default number of hard links for files is 1.

  • What is the reason?
  • In the Linux system, a directory is a special type of file, which contains the mapping relationship between the file name and the corresponding file index node.
  • The number of hard links of a directory refers to the number of hard links pointing to this directory, that is, how many directory entries point to this directory.
    • A directory has at least two hard links, one to its own record (“.”) and one to the record of the directory’s parent directory (“. .”).
    • The purpose of this design is to establish a hierarchical structure in the file system and ensure the integrity of the file system.
  • The number of hard links of a file refers to the number of hard links pointing to the file, that is, how many file names point to the file.
    • By default, a file will only have one hard link when it is created, which is its original filename.
    • This is because files are generally considered unique and there is no need to have other filenames pointing to the same file.
    • If you need to create a hard link to a file, you can use a specific command to create it.

In short, the default number of hard links for directories is 2, in order to maintain the structure and integrity of the file system; and the default number of hard links for files is 1, because files are usually considered unique.

2.2 Soft link:

ln -s source target #create soft link

Soft links are equivalent to shortcuts under Linux (ln: abbreviation for link):


Usage of soft link (shortcut) in Linux:

Now it is to link executable programs, and in the future it may be to link header files and link library files (dynamic and static libraries) without us redundantly looking for these libraries.

2.3 Hard link:

ln source target #create hard link

Create a hard link:


Inode comparison:


We found that the inode of the soft link is different from the original inode, while the inode of the hard link is the same as the original.

Number of hard connections: Once the mapping relationship is established, it changes from 1 to 2.


  • The mapping relationship between the file name and the inode is re-established in the current directory
  • The two file names are mapped to the same inode through the directory.

The reason why the number of soft links does not change is that this is an independent file because it has an independent inode.


If the reference count does not change, it proves that the soft link is not simply to establish a mapping relationship between the file name and the file inode, or else it is related to the hard link.
Connections make no difference.

2.4 The difference between soft and hard links:

soft link:

  • A soft link is an independent file with its own independent inode and inode number.
  • Shortcuts under Linux!
  • Since it is an independent file and the inode is independent, what is the content of the soft link file? ?
    • What is saved is the path of the pointed file! !
    • Saved at the system level, we can’t see it.

Hard link:

  • A hard link is not an independent file, it uses the same inode as the target file!
    • A hard link is not an independent file, it shares the same inode and data blocks as the original file.
    • The hard link and the original file point to the same data block in the file system, and they are multiple file names of the same file.
    • When you create a hard link, you are actually associating a new filename with the inode of the original file.
    • Therefore, hard links do not take up additional disk space, nor do they exist as separate file entities.
    • For the file system, whether it is a raw file or a hard link, there is only one actual data block.
  • Since hard links share the same inode as the original file, they have the same metadata information such as permissions, timestamps, etc.
    • If the content or attributes of the original file are modified, all hard-linked files associated with it will reflect those changes.
  • Attention to detail:
  • It should be noted that hard links can only be created in the same file system and cannot be associated with directories.
  • Deleting a hard link will only reduce the hard link count. When the hard link count drops to zero, the file will be actually deleted and the corresponding disk space will be released.
  • What is the number of hard links?
  • Isn’t the inode number just a concept of “pointer”?
  • The essence is to change a counter in the file inode attribute, count, to identify several file names that have established a mapping relationship with my inode.
  • In short, there are several filenames pointing to my inode (the file itself).
  • The role of hard links:
  • Switch between paths.