[Linux] Understand the file system! Dynamic and static libraries are produced and used in detail! (buffers, inodes, soft and hard links, dynamic and static libraries)

Hello, everyone, this is bang___bang_, today I will talk about file system knowledge, including buffers, inodes, soft and hard links, dynamic and static libraries. This article aims to share and record knowledge, if necessary, I hope it can be helpful.

Directory

1 buffer

The meaning of the buffer zone

Common buffer flush strategy

Buffer position guess

Phenomenon speculation

Phenomenon explanation

User-level buffer location

2 Understand the file system

Disk storage structure

Disk physical structure

Disk abstract structure

File system

inode vs filename

3 Soft and hard links

soft link

hard link

4Dynamic library and static library

Static library

Generate a static library

Use a static library

Dynamic library

Generate a dynamic library

Use a dynamic library

At the same time, use a static library or a dynamic library?

Feature summary

Static library features

Dynamic library features


1 buffer

Question: What is a buffer?

Answer: It is a memory space! !

The meaning of the buffer

We know that a section of memory space is a buffer, so why is there a buffer?

Life example mapping:

You are in Xi’an, and you have a good friend in Shanghai. Your good friend will celebrate his birthday next month. You want to give him a book of pictures you hand-painted. You can choose to give it to your friend by yourself on a bicycle; you can also choose Go downstairs to SF Express and choose to send the package and go home.

There is no doubt: it takes a lot of time for you to send the package by yourself, but you choose to send the package at SF Express very quickly, but the package will not be sent immediately, it may have to wait for the warehouse to fill up with a batch of goods before sending it together . (Pictures are data, and SF Express is the buffer zone)

Sending it by yourself by bicycle is equivalent to write-through mode (WT)

And go to SF Express to send a package and then go home directly, which is equivalent to the write-back mode (WB)

Write-Through Mode: Write data directly to the external device.

Write-back mode: first write data to the buffer, when the data in the buffer reaches a certain amount, then write to the external device collectively.

Through this example, we can obviously feel the significance of the existence of the buffer zone!

Significance of buffer storage: improve the efficiency of the whole machine. Mainly to improve the user’s response speed!

Common buffer refresh strategy

buffer strategy = general + special

generally:

? Refresh now

? Line refresh (line buffering)

? Full refresh (full buffer)

Special case:

? User force refresh (fflush)

? Process exited

In general: line-buffered device files – monitor fully buffered device files – disk files

All devices, always lean toward full buffering!

–When the buffer is full, it will be refreshed -> fewer IO operations -> fewer peripheral accesses (improving efficiency)

When IO with external devices, the size of the data is not the main contradiction, and the process of preparing IO with peripherals is the most time-consuming! !

Buffer position guess

Phenomenon guessing

There is a piece of code below, we output to the display screen respectively, and the output is redirected to a file.

int main() {
    //provided by C language
    printf("hello printf\\
");
    fprintf(stdout, "hello fprintf\\
");
    const char* s = "hello fputs\\
";
    fputs(s, stdout);

    // OS provided
    const char* ss = "hello write\\
";
    write(1, ss, strlen(ss));

    fork();

    return 0;
}
Phenomena diagram

We can see the same code, but the output results are different. The IO interface of C is printed twice, and the system interface is only printed once, which is the same as printing to the display! That is to say, there is print data corresponding to the IO interface of the parent process C in the child process, but there is no system interface. In other words, if there is a buffer, it is definitely provided by the C standard library.

Phenomenon Explanation

If printing to the monitor, the refresh strategy is line refresh, then when the process executes the fork() function, all the data in the buffer in the C standard library will be refreshed (fork is meaningless!)

For the process, when we call the C file interface fputs, we actually write the process data into the buffer in the C standard library, and then uniformly call the system interface write function to write into the corresponding target file.

If we write the data originally written in the stdout file to the disk file when we perform output redirection, the buffer mode will change from row flushing to full buffering. (\\
is meaningless!) When the process executes to the code fork(), the buffer data written by the process in the C standard library has not been refreshed yet. When the process executes the fork function, a child process is generated again.

After the fork, the parent-child process exits: refresh the data to the disk file, but the refresh is actually a write, because of the independence of the process, copy-on-write occurs, and two copies are printed! !

The data in the buffer is also the data of the parent process! After the forced refresh in advance, there is no data, and the child process will not be copied!

User-level buffer location

Question: Why does fflush only pass stdout, but can find the buffer?

Answer: In C language, to open a file, FILE* fopen(), the struct FILE structure encapsulates fd inside, and also includes the buffer structure of the language layer corresponding to the file fd!

2 Understand the file system

We use the ls -l command to read file information, actually read the file in the disk.

Disk – permanent storage medium (also SSD, U disk, flash card, CD, tape)

The disk is a peripheral + the only mechanical device in our computer! That is to say, the speed is very slow! (relative to CPU)

Disk storage structure

disk physical structure

Disk platters, heads, servos, voice coil motors, etc.

Writing to the disk is essentially changing the positive and negative on the disk.

The disk surface is divided into tracks, and the tracks are divided into sectors.

Sector (track division area) is the basic unit of disk storage data (512byte)

How to write data into a specified sector? There are the following steps: CHS addressing

–1. On which side (corresponding to which magnetic head)

–2. On which track (cylinder)

–3. On which sector

If we have CHS addressing mode, we can find any sector.

Disk abstract structure

When we were young, we all had tapes, which were tangled together in a loop, but we could also pull them all out and form a thread. We can also abstract the disk into a linear structure after being elongated.

Structure: circular structure (CHS) -> linear structure (LBA)

LBA is a very simple addressing mode; sectors are numbered starting from 0, the first sector LBA=0, the second sector LBA=1, and so on. So in the future, if we want to access a certain sector of the disk, we only need to convert LBA addressing to CHS physical addressing.

In the end: the management of the disk becomes the management of small partitions.

file system

Disk file system diagram

The above picture is a disk file system diagram (the kernel memory image is definitely different),
Disks are typical block devices
, the hard disk partition is

divided into individual
block
. one
block
The size of is determined during formatting and cannot be changed.

Although the basic unit of disk is sector (512 bytes), the basic unit of IO between OS (file system) and disk is: 4KB (8*512byte) 4KB->block size.

Super Block->Property information of the file system

Data bolcks->Multiple collections of 4KB in size, all of which save the contents of specific files

inode Table: inode is a space with a size of 128 bytes, which stores the attributes of the corresponding file. In this block group, the set of inode spaces of all files needs to be uniquely identified. Each inode Every block must have an inode number!

Block Bitmap: Suppose there are 10000 + blocks, 10000 + bits; bits correspond to specific blocks one by one, where a bit is 1, which means the block is occupied, otherwise it means available!

inode Bitmap: Suppose there are 10000+ inode nodes, there are 10000+ bits, and the bits correspond to specific inodes one by one. Among them, the bit in the bitmap is 1, which means that the inode is occupied, otherwise it means that it is available!

GDT: block group descriptor, how big is this block group, how much has been used, how many inodes are there, how many have been occupied, how many are left, how many blocks are in total, how many are used ….

We divide the block group into the above content, and write the relevant management data -> each block group does this -> the entire partition is written into the file system information! ! ! (format)

inode vs filename

  • Many files can be saved in a directory, but these files will not have the same name.
  • A directory is a file, and the directory has its own inode and Data block

The file name is in the Data block of the directory, which stores the mapping relationship with the inode number, and the file name and inode are the key values of each other, which are unique.

Why does the directory need w permission?

Because when creating a file in a directory, the directory has its own data block, and the file name of the file we create is in the Data block of the directory, so we need to write and save the file name and inode number, and the w permission is required at this time.

Why is there r permission in the directory

When we need to display the file name, we can only get the file name and related attributes from the content of the directory, we must access the file content of the directory, and we must need the r permission to get the file name from the Data block of the directory.

Create a file, what does the system do?

Find unused inodes in a specific group, assign inode numbers, if the file has content, apply for Data Block from the file content, set Block Bitmap, establish the mapping between inode and Bitmap, the corresponding relationship between inode and Bitmap, Data Block and write to the inode node In the point, the mapping relationship corresponding to the inode file name is written to the DataBlock of the specific directory.

Delete files, what did the system do?

Delete files must be deleted in this directory, find the Data Block of this directory, delete the file, the user provides the file name, index the inode number mapped by the file name in the Data Block, find and set the bit corresponding to the inodeBitmap from 1 to 0, set the bit in the Block Bitmap from 1 to 0, and unmap the file name and inode number in the Data Blocks of the directory.

Check the file, what did the system do?

Find the inode according to the file name, and then check the content to check the attributes.

3 Soft and hard links

The essential difference: is there an independent inode

Soft links have independent inodes, and soft links are an independent file

Application: Equivalent to shortcuts under Windows

Features: It can be understood as: the file content of the soft link is the path corresponding to the pointed file!

ln -s file soft link file name

Create a soft link

Soft links are like shortcuts under Windows

A hard link does not have an independent inode, and a hard link is not an independent file (there is an inode of the linked file)

Creating a hard link is to establish a mapping relationship between the file name and the specified inode in the specified directory.

ln file hard link file name

Create a hard link

Hard links do not have independent inodes! That is to say, hard links are not an independent file!

The number of hard links (reference counting)

After the hard link, the mapping relationship between inode and file name increases by 1 group, so it is 2. From here, we can see an idea:

When we delete a file, we don’t delete the inode of the file, but count the reference count of the inode of the file –. When the reference count is 0, this file is being deleted! ! (RAII thought)

The default file creation reference count is 1, and the creation directory reference count is 2

The inode and the file name correspond to a set of mapping relationships.

But why is the directory 2?

Because there are hidden files.files in the directory, that is to say, the inode corresponds to 2 file names (your own directory name, and the file name inside your own directory), so the reference count is 2.

4Dynamic library and static library

Static library

? Static library (.a): The program links the code of the library into the executable file when compiling and linking. The static library is no longer needed when the program is running.

Generate static library

Packing .o files: ar -rc xxx.a xxx.o xxx.o
Analysis ar is a gnu archive tool ar--archieve r--replace c--create, pack the .o file into .a (static library)

pack myprint.o, mymath.o to libtest.a

myprint.h code:

#pragma once

#include <stdio.h>
#include <time.h>

extern void Print(const char* str);

myprint.c code:

#include"myprint.h"

void Print(const char* str)
{
    printf("%s[%d]\\
",str,(int)time(NULL));
}

mymath.h code:

#pragma once

#include <stdio.h>

extern int addToTarget(int form,int to);

mymath.c code:

#include"mymath.h"

int addToTarget(int form,int to)
{
    int sum=0;
    for(int i=form;i<=to;i++)
    {
        sum + =i;
    }
    return sum;
}

Makefile:

libtest.a:mymath.o myprint.o
ar -rc libtest.a mymath.o myprint.o
mymath.o:
gcc -c mymath.c -o mymath.o -std=c99
myprint.o:
gcc -c myprint.c -o myprint.o -std=c99

.PHONY:clean
clean:
rm -rf *.o *.a
Static library generation graph

use static library

gcc main.c -I specifies the header file search path -L specifies the library file search path -l which library to use

Use the libtest.a static library generated above

Modify the Makefile, put the header file in the include directory, and put the static library in the lib directory.

libtest.a:mymath.o myprint.o
ar -rc libtest.a mymath.o myprint.o
mymath.o:
gcc -c mymath.c -o mymath.o -std=c99
myprint.o:
gcc -c myprint.c -o myprint.o -std=c99

.PHONY: output
output:
mkdir -p lib
mkdir -p include
cp -rf *.h include
cp -rf *.a lib

.PHONY:clean
clean:
rm -rf *.o *.a lib include

Build an executable with a static library:

Dynamic library

?Dynamic library (.so): The code of the dynamic library is only linked when the program is running, and multiple programs share the code of the library.

Generate dynamic library

To generate a dynamic library, you must add -fPIC to generate a binary file

-shared tells gcc to generate a dynamic library

gcc -fPIC -c xxxx.c -o xxxx.o //Generate dynamic library must add -fPIC
gcc -shared xxxx.o -o libxxxx.so //-shared tells gcc to generate a dynamic library

Generate libtest.so dynamic library

Write Makefile:

libtest.so:mymath_d.o myprint_d.o
gcc -shared mymath_d.o myprint_d.o -o libtest.so
mymath_d.o:mymath.c
gcc -fPIC -c mymath.c -o mymath_d.o -std=c99
myprint_d.o:myprint.c
gcc -fPIC -c myprint.c -o myprint_d.o -std=c99

.PHONY: output
output:
mkdir -p lib
mkdir -p include
cp -rf *.h include
cp -rf *.so lib

.PHONY:clean
clean:
rm -rf *.o *.so lib include
Generate dynamic library

Use dynamic library

The use of dynamic libraries is the same as that of static libraries.

gcc main.c -I specifies the header file search path -L specifies the library file search path -l which library to use

main.c uses the dynamic library libtest.so

View the library linked by the program (dynamic library): ldd

ldd executable program //View the library linked by the program

Is there a static library or a dynamic library at the same time?

Question: Assuming that there are both static libraries and dynamic libraries, which library is the program linked by default?

Modify the Makefile file:

.PHONY:all
all:libtest.so libtest.a

libtest.so:mymath_d.o myprint_d.o
gcc -shared mymath_d.o myprint_d.o -o libtest.so
mymath_d.o:mymath.c
gcc -fPIC -c mymath.c -o mymath_d.o -std=c99
myprint_d.o:myprint.c
gcc -fPIC -c myprint.c -o myprint_d.o -std=c99

libtest.a:mymath.o myprint.o
ar -rc libtest.a mymath.o myprint.o
mymath.o:
gcc -c mymath.c -o mymath.o -std=c99
myprint.o:
gcc -c myprint.c -o myprint.o -std=c99

.PHONY:clean
clean:
rm -rf *.o *.a *.so

Phenomenon:

The verification found that there are both static libraries and dynamic libraries. The default is to use the dynamic library.

So how to use static library in this case? -static specifies static linking

The meaning of -static: Abandon the principle of using dynamic libraries first by default, but use static libraries directly.

Use ldd to view the linked dynamic library, and the error message shows: it is not a dynamic executable file, that is to say, it uses a static library! !

Feature Summary

Static library features

Advantages:

①The static library is packaged into the application and the loading speed is fast
②The release program does not need to provide a static library, which is convenient for transplantation

Cons:

① The same library file data may be loaded in multiple copies in memory, consuming system resources and wasting memory
②Library file update needs to recompile the project file to generate a new executable program, wasting time

Dynamic library features

Benefits:

① Resource sharing between different processes can be realized
②Dynamic library upgrade is simple, only need to replace the library file, no need to recompile the application program
③ You can control when to load the dynamic library, and the dynamic library will not be loaded if the library function is not called

Cons:

①Loading speed is slower than static library
② The release program needs to provide a dependent dynamic library

At the end of the article, I explain the buffer and its meaning at the beginning, and verify the refresh strategy of the user-level buffer. Next, I will talk about the file system. First, I will introduce the storage structure of the disk (including physical structure and abstract structure), and introduce the relationship between inode and file name. The relationship, the use of soft and hard links, and finally explain the dynamic and static libraries, detail how to make and use the dynamic and static libraries, and explore the default use of the dynamic library when the dynamic and static libraries exist at the same time, and the solution to use the static library, and finally summarize the dynamic static features. This article aims to share and record knowledge, if necessary, I hope it can be helpful! ! thanks for watching!

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge Cloud native entry skill treeHome pageOverview 14709 people are learning systematically