File system and inode number

File descriptor fd

0 & amp;1 & amp;2

By default, the Linux process will have three default open file descriptors, namely standard input 0, standard output 1, and standard error 2. The physical devices corresponding to 0, 1, and 2 are generally: keyboard, monitor, and monitor, so input and output You can also use the following methods

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>

int main()
{
    char buf[1024];
    ssize_t s = read(0, buf, sizeof(buf));
    if (s > 0)
    {
        buf[s] = 0;
        write(1, buf, strlen(buf));
        write(2, buf, strlen(buf));
    }
    return 0;
}

The file descriptor is from
0
Starting small integer. When we open a file, the operating system creates a corresponding data structure in memory to describe the target file. So there is file
Structure. Represents an open file object. And the process executes
open
System call, so the process and file must be associated. Each process has a pointer *files,
point to a table
files_struct,
The most important part of this table is that it contains an array of pointers, each element is a pointer to an open file! So, essentially, the file descriptor is the subscript of the array. So, as long as you hold the file descriptor, you can find the corresponding file

File descriptor allocation rules

Let’s compare the direct observation through a piece of code.

Demo code:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
    int fd = open("myfile", O_RDONLY);
    if (fd < 0)
    {
        perror("open");
        return 1;
    }
    printf("fd: %d\\
", fd);
    close(fd);
    return 0;
}

operation result:

Because file descriptors 0, 1, and 2 are already occupied, it seems easy to understand directly starting from 3. We’re guessing that file descriptors are allocated from unoccupied numbers from small to large, so right? Let’s verify it with the following code

Demo code

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
    int fd1 = open("log1.txt", O_WRONLY | O_CREAT);
    int fd2 = open("log2.txt", O_WRONLY | O_CREAT);
    int fd3 = open("log3.txt", O_WRONLY | O_CREAT);

    if (fd1 < 0 || fd2 < 0 || fd3 < 0)
    {
        perror("open");
        return 1;
    }

    printf("fd1: %d\\
", fd1);
    printf("fd2: %d\\
", fd2);
    printf("fd3: %d\\
", fd3);
    close(fd1);
    close(fd2);
    close(fd3);
    return 0;
}

operation result

So what if we turn off 0 and 2 first (1 is the standard output. In order to facilitate the observation of the results, we will not turn it off)

Let’s take a look at the code

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
    close(0);
    close(2);
    int fd1 = open("log1.txt", O_WRONLY | O_CREAT);
    int fd2 = open("log2.txt", O_WRONLY | O_CREAT);
    int fd3 = open("log3.txt", O_WRONLY | O_CREAT);

    if (fd1 < 0 || fd2 < 0 || fd3 < 0)
    {
        perror("open");
        return 1;
    }

    printf("fd1: %d\\
", fd1);
    printf("fd2: %d\\
", fd2);
    printf("fd3: %d\\
", fd3);
    close(fd1);
    close(fd2);
    close(fd3);
    return 0;
}

operation result

Conclusion: File descriptors are allocated from small to large unallocated numbers.

Redirect

What if 1 is turned off?

Demo code:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
    close(1);
    int fd = open("myfile", O_WRONLY | O_CREAT, 0644);
    if (fd < 0)
    {
        perror("open");
        return 1;
    }
    printf("fd: %d\\
", fd);
    printf("hello dear programmer");
    fflush(stdout);//Refresh the buffer?

    close(fd);
    exit(0);
}

operation result:

There is no output on any page on the monitor, but when we open the file myfile, we find that all the content that was originally input to the monitor has been input into the file. This is called redirection.

Through this picture, let’s understand the essence of redirection

dup2 system call

man dup2

Usage example:

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int main()
{
    int fd = open("./log", O_CREAT | O_RDWR);
    if (fd < 0)
    {
        perror("open");
        return 1;
    }
    close(1);
    dup2(fd, 1);
    for (;;)
    {
        char buf[1024] = {0};
        ssize_t read_size = read(0, buf, sizeof(buf) - 1);
        if (read_size < 0)
        {
            perror("read");
            break;
        }
        printf("%s", buf);
        fflush(stdout);
    }
    return 0;
}

operation result:

FILE

Because IO-related functions correspond to system call interfaces, and library functions encapsulate system calls, in essence, files are accessed through fd. Therefore, FILE in the C library must encapsulate fd.

#include <stdio.h>
#include <string.h>
int main()
{
    const char *msg0 = "hello printf\\
";
    const char *msg1 = "hello fwrite\\
";
    const char *msg2 = "hello write\\
";
    printf("%s", msg0);
    fwrite(msg1, strlen(msg0), 1, stdout);
    write(1, msg2, strlen(msg2));
    fork();
    return 0;
}

operation result:

But if What about redirecting output to a process? After executing ./test > file, the result becomes

we discover
printf
and
fwrite
(Library functions) are all output
2
times, while write only outputs once (system call). why? Here we guess it may be related to fork. By blocking fork, we found that the result is

(1) Generally, C library functions are fully buffered when writing to files, while writing to the display is line buffering.

(2) The printf fwrite library function will have its own buffer (the progress bar example can illustrate this). When redirection to an ordinary file occurs, the data
The buffering mode has been changed from line buffering to full buffering.

(3) The data we put in the buffer will not be refreshed immediately, even after fork

(4) But after the process exits, it will be refreshed uniformly and written to the file.

(5) But when forking, the parent-child data will be copied on write, so when your parent process is ready to refresh, the child process will also have the same
One piece of data generates two pieces of data.

(6) There is no change in write, indicating that there is no so-called line buffering

To sum up: The printf fwrite library function has its own buffer, but the write system call does not have a buffer. In addition, the buffers we are talking about here are all user-level buffers. In fact, in order to improve the performance of the entire machine, the OS will also provide related kernel-level buffers, but this is beyond the scope of our discussion. Who provides this buffer zone? printf fwrite is a library function, and write is a system call. The library function is in the “upper layer” of the system call and is the “encapsulation” of the system call. However, write does not have a buffer, while printf fwrite does. It is enough to show that the buffer is a secondary buffer. Added, and because it is C, it is provided by the C standard library

//Buffer related
/* The following pointers correspond to the C++ streambuf protocol. */
/* Note: Tk uses the _IO_read_ptr and _IO_read_end fields directly. */
char* _IO_read_ptr; /* Current read pointer */
char* _IO_read_end; /* End of get area. */
char* _IO_read_base; /* Start of putback + get area. */
char* _IO_write_base; /* Start of put area. */
char* _IO_write_ptr; /* Current put pointer. */
char* _IO_write_end; /* End of put area. */
char* _IO_buf_base; /* Start of reserve area. */
char* _IO_buf_end; /* End of reserve area. */
/* The following fields are used to support backing up and undo. */
char *_IO_save_base; /* Pointer to start of non-current get area. */
char *_IO_backup_base; /* Pointer to first valid character of backup area */
char *_IO_save_end; /* Pointer to end of non-current get area. */

Understanding file systems

We use
ls -l
When you see it, in addition to the file name, you also see the file metadata.

Each line contains
7
List:

Mode, number of hard links, file owner, group, size file name, last modification time

ls -l
Read file information stored on disk and display it

In addition, the stat command can also view file information.

File system

Linux ext2 file system, the picture above is the disk file system diagram (the kernel memory image must be different), the disk is a typical block device, and the hard disk partition is divided into blocks. The size of a block is determined during formatting and cannot be changed. For example
mke2fs
of
-b
Options can set the block size to 1024, 2048 or 4096 bytes. The size of the Boot Block in the picture above is determined.

Block Group: The ext2 file system will be divided into several Block Groups according to the size of the partition. Each Block Group has the same structural composition. Examples of government management of various districts

Super Block: Stores the structural information of the file system itself. The recorded information mainly includes: the total amount of bolts and inodes, the number of unused blocks and inodes, the size of a block and inode, the most recent mount time, the most recent data writing time, and the most recent disk check time. and other file system related information. The information in the Super Block is destroyed, and it can be said that the entire file system structure is destroyed.

GDT, Group Descriptor Table: block group descriptor, describing block group attribute information

Block Bitmap: Block Bitmap records which data block in the Data Block has been occupied and which data block has not been occupied.

inode Bitmap: Each bit indicates whether an inode is free and available.

i node table: stores file attributes such as file size, owner, last modification time, etc.

Data area: stores file content

The idea of keeping properties and data separate seems simple, but how does it actually work? Let’s see how this works by touching a new file.

[hty@iZ2vcboxg2e41nj4s5s6zrZ test]$ cd day3
[hty@iZ2vcboxg2e41nj4s5s6zrZ day3]$ touch file
[hty@iZ2vcboxg2e41nj4s5s6zrZ day3]$ ls -i file
1581596 file

Creating a new file mainly involves the following four operations:

1. Storage attributes

The kernel first finds a free i-node (here 263466). The kernel records file information into it.

2. Store data

The file needs to be stored in three disk blocks, and the kernel finds three free blocks: 300, 500, and 800. The first block of data in the kernel buffer

Copy to 300, copy the next block to 500, and so on.

3. Record allocation

The file contents are stored in order 300, 500, 800. The kernel records the above block list in the disk distribution area on the inode.

4. Add filename to directory

The new file name is abc. How does Linux record this file in the current directory? The kernel adds the entry (263466, file) to the directory file. The correspondence between the file name and the inode connects the file name to the file’s content and attributes.

The following explains the three times of the file:

Access last access time

Modify file content last modified time

Change property last modified time