2023-2024-1 20232825 “Linux Kernel Principles and Analysis” Eighth Week Assignment

Table of Contents

1. Compilation and linking process

1. Preprocessing:

2. Compilation:

3. Assembly:

4. Linking:

2. ELF executable file format

3. Use the exec* library function to load an executable file

4. Use gdb to trace and analyze an execve system call kernel processing function sys_execve

in conclusion:

Attached, chatGPT auxiliary Q&A:

1. The process of compilation and linking

Linux compilation and linking generally goes through the following steps:

1. Preprocessing:

The preprocessing phase is usually done by a C/C++ preprocessor (such as gcc or g++). At this stage, the main task is to process source code files, including expanding macro definitions, including header files, deleting comments, etc. The preprocessed code will generate a temporary intermediate file, usually with .i or .ii as the extension.

For example, you can use the following command to perform preprocessing:

gcc -E source.c -o source.i

2. Compilation:

The compilation phase translates the preprocessed code into assembly language. The output of this stage is usually in the form of assembly code, and the extension is usually .s.

For example, compilation can be performed using the following command:

gcc -S source.i -o source.s

3. Assembly:

The linking phase combines object files and library files (such as the standard library) into the final executable file. At this stage, symbol references are resolved and associated to the correct addresses. The linker also resolves library file references to ensure that all symbols are resolved correctly.

For example, you can use the following command to perform a link operation:

as source.s -o source.o

4. Linking:

For example, you can use the following command to perform a link operation:

gcc source.o -o executable

Finally formed into an executable file

2. ELF executable file format

ELF (Executable and Linkable Format) is a file format used to store executable programs, shared libraries, object files, and core dump files. It is a standard binary file format widely used in Linux systems and many other Unix-like systems. The ELF file format has a modular structure, which can contain multiple different types of parts, including the following key parts:

Header: The header of an ELF file contains basic information about the file, such as file type (executable file, shared library, object file, etc.), target architecture, entry point address, and offsets of various segments and sections. and size information.
Section Table: The ELF file contains a section table that describes each section in the file. Each section can contain different types of data, such as code, data, symbol tables, relocation information, etc.
Section Header: Each section has an associated section header, which is used to describe the attributes and location information of the section. These headers include section name, type, offset, size and other information.
Program Header: Executable files and shared libraries contain program headers that describe information about how to load the file into memory. It includes information such as segment type, file offset, memory address, segment size, etc. Object files usually do not contain program headers because they are not loaded into memory.
Data Sections: ELF files contain multiple sections, each section stores a specific type of data. For example, the code segment contains executable code, the data segment contains executable data, the symbol table section contains symbol information, the relocation section contains relocation information, etc.
Symbol Table: Executable files and shared libraries contain a symbol table that contains information about global and local symbols used in the program. This is very important for dynamic linking and debugging.
Relocation Table: The relocation table contains information about the locations that need to be modified at load time. This is very important for resolving symbol references.

As shown in the picture:

3. Use the exec* library function to load an executable file

The sample program is as follows:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    // Declare a character array to save the path of the executable file to be executed
    char *args[] = {"/bin/ls", "-l", NULL};

    //Use the execve function to execute a new executable file
    // The first parameter is the path to the executable file
    //The second parameter is the parameter list, where the first parameter is usually the name of the executable file
    // The third parameter is the environment variable. You can pass NULL to inherit the environment variables of the current process.
    if (execve("/bin/ls", args, NULL) == -1) {
        perror("execve");
        exit(EXIT_FAILURE);
    }

    // The code here will not be executed because after execve succeeds, the image of the current process has been replaced
    printf("This won't be printed\\
");

    return 0;
}

The execution results are as follows:

Four. Use gdb to trace and analyze an execve system call kernel processing function sys_execve

1. Delete the original menu and clone the new menu.

2. Add a new exec system call in test.c

3. gdb

cd LinuxKernel
qemu -kernel linux-3.18.6/arch/x86/boot/bzImage -initrd rootfs.img -S -s

gdb
file linux-3.18.6/vmlinux
target remote:1234
bsys_execve
b load_elf_binary
c

4. View the elf file header

Conclusion:

The overall calling relationship when using the system call sys_exec to execute one executable file is:
sys_execve()->do_execve()->do_execve_common()->exec_binprm()->search_binary_handler()->load_elf_binary()->start_thread()

The execve() system call is essentially the sys_execve() function executed in kernel mode. The processing process is as follows:

The user program initiates the execve() system call, passing the executable file path, parameter list and environment variables.
In the kernel, the sys_execve() function receives these parameters. sys_execve() is the entry point of the system call. It usually finds the corresponding function pointer in the system call table in the kernel.
sys_execve() starts execution. It will first perform some basic verification and permission checks to ensure that the user has the permission to execute the specified file. If the permission check fails, the system call will return an error.
If the permission check passes, sys_execve() will further process the parameters to prepare for the execution of the new process.
sys_execve() will create a new process descriptor and release the resources of the old process, such as closing the file descriptor, unbinding the signal processing function, etc.
Next, sys_execve() will call the do_execve_common() function, which will perform more parameter processing and preparation work.
do_execve_common() will select the corresponding handler based on the format of the executable file (such as ELF format), and then call the corresponding loader function (such as load_elf_binary()).
Loader functions (such as load_elf_binary()) load the contents of the executable file, including code, data, symbol tables, etc. It also sets up the memory layout of the new process, allocates virtual memory space, and performs some initialization work.
Ultimately, the loader function passes control to the new user program’s entry point, usually the executable’s main() function. This will start a new userland process execution.