(2023-2024-1) 20232830 “Analysis and Design of Linux Kernel Principles” Assignment for the Eighth Week

1. Experimental related knowledge

1.1 Compilation and linking process

Compilation and linking is the process of converting source code into an executable program. It usually includes the following steps:

Preprocessing:
Preprocessing is the first step in the compilation and linking process. At this stage, the preprocessor will process the source code, including expanding macro definitions, inserting header file content, etc. The preprocessor modifies the source code according to the preprocessing instructions starting with the character ‘#’ (such as #include, #define, etc.) and generates a preprocessed code file.
Compilation:
Compilation is the process of converting preprocessed code files into assembly code. The compiler will convert the source code into assembly language and generate the corresponding assembly code file. During this process, the compiler will perform lexical analysis, syntax analysis, semantic analysis and other operations to check the correctness of the code and convert it into assembly code.
Assembly:
Assembly is the process of converting assembly code into machine code. The assembler reads the assembly code file, converts it into machine-executable instructions, and generates an object file. Each assembly instruction usually corresponds to a machine instruction, including opcodes, registers, memory addresses, etc.
Linking:
Linking is the process of combining multiple object files and library files into an executable program. During the linking process, the linker will parse the symbol references and definitions in the target file, process the symbol table, and connect the reference relationships between each target file. The linker also processes library files and associates the required library functions with the program. Finally, a complete executable program file is generated.

Summary: The process of compilation and linking includes preprocessing, compilation, assembly and linking. Preprocessing performs macro expansion and header file insertion on the source code; compilation converts the source code into assembly code; assembly converts assembly code into machine code; linking combines object files and library files into an executable program. The process of compiling and linking converts source code into an executable program so that the program can be run on the computer.

1.2 ELF executable file format

The ELF executable file format consists of the following main components:

ELF Header:
The ELF file header is located at the beginning of the file and contains basic information describing the entire file, such as file type, target architecture, entry point address, offset of the program header table and section header table, etc. The file header provides basic information for reading and parsing ELF files.
Program Header Table:
The program header table describes the layout of the ELF file in memory, including segment (segment is a logical organizational unit in ELF) information required for loading and execution. Each program header entry describes the starting address, size, access rights and other information of a segment, which is used by the operating system to load and execute executable files.
Section Header Table:
The section header table contains information about each section, such as code segments, data segments, symbol tables, etc. Each section header entry describes a section’s starting address, size, access rights, symbol table index and other information. Section header tables are very important for debugging and linking programs.
Sections:
A section is a logical organizational unit in an ELF file and contains various data and codes. Common sections include code segment (.text), data segment (.data), read-only data segment (.rodata), symbol table (.symtab), etc. Each section can have different attributes, such as executable, writable, readable, etc.
Symbol Table:
The symbol table stores symbol information defined and referenced in the program, such as variables, functions, global variables, etc. Each symbol item in the symbol table contains the symbol’s name, type, size, binding and other information, and is used for linking and debugging programs.

In addition to the above main components, ELF files also contain other auxiliary information, such as dynamic link information, relocation tables, debugging information, etc., to support functions such as dynamic linking, program debugging, and code relocation.

The ELF executable file format is designed to have good scalability and portability, and can adapt to different operating systems and target architectures. It is widely used in various programming environments and development tools, providing basic support for program development, compilation, linking and execution.

2. Experiment 7: Loading and starting executable programs

2.1 Use “exec library function” to load an executable file

Open the terminal and execute the following command to add the “exec” function to the test.c file;

cd LinuxKernel
rm -rf menu
# If cloning is not possible, you can use the sidebar to upload a compressed file to replace the menu folder.
git clone https://github.com/mengning/menu.git
cd menu
mv test_exec.c test.c
vitest.c

Open the test.c file and you can see that the exec command has been added to the main function;
exec
Enter the menu folder, run compilation, and view the exec function implementation;
make rootfs
exec

The content of exec function is as follows:

int Exec(int argc, char *argv[])
{<!-- -->
int pid;
/* fork another process */
pid = fork();
if (pid < 0)
{<!-- -->
/* error occurred */
fprintf(stderr,"Fork Failed!");
exit(-1);
}
else if (pid == 0)
{<!-- -->
/* child process */
    printf("This is Child Process!\\
");
execlp("/hello","hello",NULL);
}
else
{<!-- -->
/* parent process */
    printf("This is Parent Process!\\
");
/* parent will wait for the child to complete*/
wait(NULL);
printf("Child Complete!\\
");
}
}

As you can see, the exec function uses system calls such as fork() and execlp().

First, the function defines a function called Exec, which accepts two parameters: argc represents the number of command line parameters, and argv is a pointer to an array of parameter strings.
The fork() system call is used in the function to create a new process. fork() copies the current process and creates a new child process. In the original process, fork() returns the process ID (pid) of the child process, while in the child process, fork() returns 0.
By checking the return value of fork(), you can determine whether the current code is executed in the parent process or the child process.
If the return value of fork() is less than 0, it means that the creation of the child process failed, and the program will print an error message and exit.
If the return value of fork() is equal to 0, it means that the current code is executed in the child process. The child process will output a message “This is Child Process!” and then use the execlp() system call to execute an executable file named “/hello”.
If the return value of fork() is greater than 0, it means that the current code is executed in the parent process. The parent process will output a message “This is Parent Process!” and then use the wait() system call to wait for the end of the child process.
When the child process completes execution, the parent process will continue to execute and output a message “Child Complete!”.

Overall, this function creates a child process and executes a program named “/hello” in the child process. The parent process waits for the child process to finish executing before continuing.

2.2 Tracking analysis through gdb

Go back to the parent directory, use the following command to start the kernel and stop it before the CPU runs the code for debugging;
qemu -kernel linux-3.18.6/arch/x86/boot/bzImage -initrd rootfs.img -s -S
qemu

Open a new terminal window and use the following commands to start gdb debugging;

gdb
file linux-3.18.6/vmlinux
target remote:1234

Set a breakpoint at the entrance of the system call sys_execve;
b sys_execve
Continue running the program, enter exec in the QEMU window, and the system will stop at the breakpoint set above, as shown in the figure:
gdb
Continue to set the following breakpoints to completely trace the creation and startup code of the process. Of course, you can also perform single-step tracing;

b load_elf_binary
b start_thread

gdb

2.3 Answers to questions related to the experiment

1. Where does the new executable program start executing?
The new executable program starts execution from its entry point. The entry point is the starting execution location of the program, usually the starting address of the executable file.

2. Why can the new executable program be executed smoothly after the execve system call returns?
The execve system call replaces the memory image of the current process with the contents of the new executable file. This means that the original program code, data and stack are replaced by the new executable file, causing the new program to execute from its entry point. The execve system call also passes command line arguments to the new program, allowing it to obtain the arguments and process them accordingly.

3. What is the difference when the execve system call returns for a statically linked executable program and a dynamically linked executable program?
For statically linked executables, there won’t be much difference when the execve system call returns. Because a statically linked executable program has all required library functions and code statically linked into the executable file during compilation, no further dynamic linking operations are required during execution.
For dynamically linked executable programs, there will be some differences when the execve system call returns. Dynamically linked executable programs still rely on shared library files when running, and these shared library files usually exist in the system as dynamic link libraries. When the execve system call returns, the operating system loads the required shared library files according to the needs of the executable program and establishes a link relationship between the program and the library. In this way, during program execution, when a function in the shared library needs to be called, the corresponding function will be parsed and executed at runtime through dynamic linking. Therefore, a dynamically linked executable program needs to perform some dynamic linking operations after the execve system call returns to ensure that the code in the shared library can be called and executed normally.

3. Experiment summary

In Linux, the function of the exec function family is to find the executable file based on the specified file name and use it to replace the contents of the calling process. This means that the current process’s executable is completely replaced with the new executable. The starting point of execution of an executable file is determined based on the value of the EIP register pushed onto the kernel stack when the execve system call is executed. Although the executable file of the process has been replaced, the actual execution of the instructions in the new executable file needs to wait until the entry address defined by the new program is executed, usually 0x8048xx. By modifying the value of the EIP register in the kernel stack and setting it as the starting point of the new program, the new program can be started when the execve system call returns to user mode.

The executable file here can be a binary file or any script file executable under Linux. If it is not an executable file, the kernel will interpret it as a shell file and be executed by the shell. When the Linux kernel or program uses the fork function to create a child process, the child process usually calls one of the exec functions to execute another program. After calling the exec function, the program executed by the current process is completely replaced by the new program, and the new program will start execution from its main function. Since the exec function does not create a new process, the process ID has not changed. It can be said that the exec function just replaces the code, data segment and stack segment of the current process with a brand new program.

Directory