2023-2024-1 20232831 “Linux Kernel Principles and Analysis” Eighth Week Assignment

Article directory

  • 1. How the Linux kernel loads and starts an executable program
  • 2. Experimental process
    • 1. Understand the compilation and linking process and the ELF executable file format
    • 2. Programming uses the exec* library function to load an executable file, and programming practices two ways of using dynamic link libraries.
    • 3. Use gdb to trace and analyze an execve system call kernel processing function sys_execve
    • 4. Detailed analysis (questions 4 and 5)
  • 3. Chatgpt help
  • Summarize

1. How the Linux kernel loads and starts an executable program

Experiment content:

1. Understand the compilation and linking process and the ELF executable file format;?

2. Programming uses the exec* library function to load an executable file. Dynamic linking is divided into dynamic linking when the executable program is loaded and dynamic linking during runtime. Programming exercises use these two ways of using the dynamic link library;

3. Use gdb to trace and analyze an execve system call kernel processing function sys_execve to verify your understanding of the processing required for the Linux system to load an executable program. For details, please refer to the third section of this week; it is recommended to complete it in the Linux virtual machine environment of the laboratory building experiment.

4. Pay special attention to where the new executable program starts to execute? Why can the new executable program be executed smoothly after the execve system call returns? What is the difference when the execve system call returns for a statically linked executable program and a dynamically linked executable program?

5. Analyze the system call processing process corresponding to the exec* function.

2. Experimental process

1. Understand the compilation and linking process and the ELF executable file format

Compilation and linking process:

The compilation and linking process in Linux usually includes preprocessing, compilation, assembly, and linking.

Preprocessing: performed before compilation, processing #include, #define and other preprocessing instructions to generate an expanded source file.

Compile: Convert the expanded source files into assembly files. The compiler translates the source code into assembly language.

Assembly: Convert assembly files into object files. The assembler translates assembly files into binary code that the machine can execute.

Link: Link object files and library files into executable files. The linker is responsible for resolving symbolic dependencies between different object files and merging them into a single executable file. This includes global symbol resolution, relocation, and symbol resolution.

Executable files in the linking step are usually in ELF (Executable and Linkable Format) format. ELF contains file headers, section header tables, and section contents. details as follows:

ELF (Executable and Linkable Format) is a binary file format used for executable files, object files and shared libraries. It contains the file header, section header table, and section content. In Linux systems, the ELF file format is used to store structural information of executable files, which helps the operating system understand how to load and execute these files. This format describes the layout of an executable file, where:
① File header (ELF Header): Contains a description of the overall information about the file, such as architecture, byte order, file size, etc. It also contains important information about the program entry point, the offset of the section header table, and the number of entries in the section header table.
②Section Header Table: Contains metadata about each section in the file, such as section name, size, type, offset and other information.
③Section content: Contains actual code and data. These sections include code segments (text), data segments (data), bss segments, etc. The code segment stores execution instructions, the data segment stores variables and initialization data, and the bss segment stores uninitialized global variables.

Here is a code showing the EFL format:

readelf -h your executable file name

Here, I wrote a simple C language program test.c, the content is as follows:

#include<stdio.h>
int main(){<!-- -->
printf("my StudentId is 20232831\
");
return 0;
}

After compilation and running, the ELF format result of the file is as follows:

2. Programming uses the exec* library function to load an executable file, and programming practices two ways of using dynamic link libraries

① Programming uses the exec* library function to load an executable file
The exec* function family covers a series of functions, including: execl, execle, execlp, execv, execve, execvp, and execvpe.
These functions allow you to load a new program and execute it, passing argument lists and environment variables. Each function has a specific purpose and parameter list.

Two C language codes are written here to use the exec* library (The execve function is used here).
The execve function is used to load and execute a new program. It allows passing a parameter list to the new program and specifying environment variables.
The following code prints out the list of command line arguments passed to it. Among them, myecho.c prints out the list of command line parameters passed to it, myexecve.c uses the execve function call to load a new program, that is, myecho, and its name (myecho) is passed in as the first parameter of myexecve, and then That is, other data in list1[] is printed.

// myecho.c:

//Print command line parameters
#include<stdio.h>
#include<unistd.h>

#include<stdlib.h>

int main(int argc,char*argv[]){<!-- -->
        for(int i=0;i<argc;i + + ){<!-- -->
                printf("argv[%d]:%s\
",i,argv[i]);
        }
        exit(EXIT_SUCCESS);
}

// myexecve.c

#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>

int main(int argc,char* argv[]){<!-- -->
        //NULL must be added at the end here, otherwise calling execve will report a Bad address error.
        char*list1[]={<!-- -->NULL,"Hello","Linux","World",NULL};
        char*list2[]={<!-- -->NULL};
        //The number of parameters must be 2. If it is not 2, an error will be reported.
        if(argc!=2){<!-- -->
                fprintf(stderr,"%s wrong",argv[0]);
                exit(EXIT_FAILURE);
        }
        //Put the first parameter of list1 as the file name of the echo file just now
        list1[0]=argv[1];
        //execve
        execve(argv[1],list1,list2);
        perror("execve");
}

The running results are as follows:

②Dynamic linking is divided into dynamic linking when the executable program is loaded and dynamic linking during runtime. Programming exercises use these two ways of using the dynamic link library
First, create a shared library to be called by these two methods. Create two more C language codes to implement dynamic linking when the executable program is loaded and dynamic linking during runtime.
shared_lib.c is a shared code library used to be linked by calls.
main_exec_link_time.c is used to implement dynamic linking when the executable program is loaded.
main_run_time.c uses the runtime dynamic linking method to implement dynamic linking.

The codes for the three files are as follows:

// shared_lib.c - shared library code

#include <stdio.h>

void my_function() {<!-- -->
    printf("Dynamic shared link library was successfully called!\
");
}

// main_exec_link_time.c - Use dynamic linking when the executable program is loaded

#include <stdio.h>

extern void my_function(); // Reference the function in the shared library

int main() {<!-- -->
printf("This method is - use dynamic linking when loading an executable program!\
");
    my_function();
    return 0;
}

// main_run_time.c - Use runtime dynamic linking

#include <stdio.h>
#include <dlfcn.h>

int main() {<!-- -->
    void *handle;
    void (*my_function)();

    handle = dlopen("shared_lib.so", RTLD_LAZY); // Specify the path to the shared library

    if (!handle) {<!-- -->
        fprintf(stderr, "%s\
", dlerror());
        return 1;
    }
\t
printf("This method is - use runtime dynamic linking!\
");
\t
    // Get the function pointer in the shared library
    my_function = dlsym(handle, "my_function");
    my_function(); // Execute the function in the shared library
    dlclose(handle); // Close the shared library

    return 0;
}

First, compile the shared_lib.c shared library code into the shared library shared_lib.so, using the following code:

gcc -shared -o shared_lib.so -fPIC shared_lib.c

Second, use the following code to compile and implement dynamic linking when the executable program is loaded:
Specially note that export LD_LIBRARY_PATH=$PWD is an environment variable. If you can compile but get an error when executing, you must add this line for the following reasons:

“export LD_LIBRARY_PATH=$PWD”
This command sets the environment variable LD_LIBRARY_PATH to the current working directory, which is used to specify the library search path of the dynamic linker. When a program needs to dynamically link a shared library at runtime, it looks for the path specified in LD_LIBRARY_PATH, which contains the location of the dynamic link library (.so file). Setting this to the current working directory ($PWD) causes the dynamic linker to search for and load shared libraries in the current working directory.

export LD_LIBRARY_PATH=$PWD
gcc main_exec_link_time.c -o exec_link_time -L. -lshared_lib
./exec_link_time

Finally, use the following code to compile and implement runtime dynamic linking

gcc main_run_time.c -o run_time -ldl
./run_time

Screenshots of the entire running process are as follows:

3. Use gdb to trace and analyze an execve system call kernel processing function sys_execve

①Basic construction part
As in last week’s experiment, first complete the following code to replace the test.c file, that is, replace it with test_exec.c

cd LinuxeKernel
rm menu -rf
git clone https://github.com/mengning/menu.git
cd menu
mv test_exec.c test.c
make rootfs
MenuOS>>help
MenuOS>>exec


The following is the specific code of the exec function:

Open a frozen kernel, and then open a shell for gdb analysis and debugging. The operation is consistent with the previous experiments.

cd LinuxKernel
qemu -kernel linux-3.18.6/arch/x86/boot/bzImage -initrd rootfs.img -s -S //Freeze kernel startup

Open a new empty shell, enter gdb debugging, and establish a connection:

cd LinuxKernel
gdb
(gdb)file linux-3.18.6/vmlinux
(gdb)target remote:1234


After the frozen kernel starts running, you can use exec and the basic part is completed.

②gdb debugging part
Set breakpoints on the three system call functions sys_execve, load_elf_binary, and start_thread respectively, and perform gdb analysis. The analysis process is as follows:


gdb trace analysis ends.

4. Detailed analysis (questions 4 and 5)

1. Pay special attention to where the new executable program starts to execute? Why can the new executable program be executed smoothly after the execve system call returns? What is the difference when the execve system call returns for a statically linked executable program and a dynamically linked executable program?
① A new executable program starts executing from the entry point of the executable program. The entry point is usually a file header in the ELF executable file format. The address of the program entry point is stored here.

① File header (ELF Header): Contains a description of the overall information about the file, such as architecture, byte order, file size, etc. It also contains important information about the program entry point, the offset of the section header table, and the number of entries in the section header table.

When the execve system call is executed to load a new executable program, the operating system loads the program into memory in the process’s virtual address space and sets the program counter (PC) to point to the entry point address of the executable file. Thereafter, the CPU will start executing instructions from this address, thereby starting the execution process of the program.

②Why can the new executable program be executed smoothly after the execve system call returns?
The following answer was made based on ChatGpt:
The execve system call creates a completely new program context, allowing the new program to run independently of the process calling it. This approach ensures the independence of the program and allows the new program to run smoothly in its own execution environment while maintaining the integrity of the parent process. Therefore, the new executable program can be executed smoothly after the execve system call returns.

③What is the difference between a statically linked executable program and a dynamically linked executable program when the execve system call returns:
A statically linked program will load the entire program and all its dependencies into memory when execve returns, while a dynamically linked program will need to wait until runtime before loading the dynamic link library as needed. Therefore, static linking also takes up more space because all specified libraries are loaded, while dynamic linking is more flexible but requires more work when deploying.
Here are their pros and cons:

2. Analyze the system call processing process corresponding to the exec* function.

The exec* functions replace the current program’s image by executing a new program file in the context of the calling process.
After calling the exec* function, there are mainly the following system calls:
execve system call: Execute a new program in the kernel.
do_execve function: The main execution path of the kernel, processing the program path, parameters and environment variables transmitted from user space.
load_elf_binary function (for ELF format programs): loads the executable file and sets the address space and resources of the new process.
start_thread: Start executing a new program in user space.
The entire process includes loading the executable file, establishing the address space and resources of the new process, and starting the execution of the new program. This ensures that the new program’s ELF file is loaded and executed correctly.

The general process of the execve system call used in this article is as follows:

1. User mode to kernel mode: First, the process switches from user mode to kernel mode, triggering a system call.
2. Parameter analysis: The kernel obtains the parameters passed in execve, including the path and parameter list of the new program.
3. Open the file: The kernel tries to open the executable file according to the given path.
4. Permission check: The kernel will perform a series of permission and security checks to ensure that the calling process has sufficient permissions to execute the file.
5. Replace the current process: If the permission check passes, the kernel replaces the image of the current process with the image of the new program. This means that the code, data, and stack of the current process will be replaced with the contents of the new program.
6. Load new program: The kernel loads the new executable file into memory and sets up a new user space.
7. Start execution: The kernel transfers control to the entry point of the new program and begins executing the new program.

3. Chatgpt Help




Summary

The content of this experiment is “How does the Linux kernel load and start an executable program”, which includes compilation and linking, dynamic linking, exec* library functions and system calls, etc. Through practice, debugging and analysis, I have a deep understanding of the process of loading executable programs and dynamic linking at runtime. Using GDB to track the sys_execve system call function also deepened my understanding of the Linux system loader.