Linux kernel crash analysis

Reprinted from: http://linux.cn/article-3475-1.html

We often encounter some kernel crash situations at work. This article analyzes the printed information after the kernel crash. The kernel version used is: Linux2.6.32.

The life cycle of each process ranges from a few milliseconds to several months. Generally, they interact with the kernel. For example, user space programs use system calls to enter kernel space. At this time, the stack space of user space is no longer used, but the corresponding kernel stack space is used. For each process, the Linux kernel compactly stores two different data structures in a storage space allocated separately for the process: one is the kernel-mode process stack, and the other is the data structure next to the process descriptor. thread_info, called thread descriptor. The kernel stack size is generally 8KB, which is 8192 bytes, occupying two pages. There is a definition of the kernel stack in the thread_info.h file in the Linux-2.6.32 kernel:

#define THREAD_SIZE 8192

The following union structure is used in the Linux kernel to represent the thread descriptor and kernel stack of a process. The file include/linux/sched.h in the kernel.

 union thread_union {
    struct thread_info thread_info;
    unsigned long stack[THREAD_SIZE/sizeof(long)];
    };

This structure is a union. We have seen explanations about union in C language books. The union is described in the book C Programming Language as follows:

1) A federation is a structure;

2) The offsets of all its members relative to the base address are 0;

3) The structural space must be large enough to accommodate the “widest” member;

4) Its alignment should be suitable for all members;

As can be seen from the above description, the size of the thread_union structure is 8192 bytes. That is, the size of the stack array, the type is unsigned long type. Since the member variables in the union all occupy the same memory area, there is always a concept when writing code. Only one member variable can be used for an instance of the union, otherwise the original variable will be overwritten. If this sentence is correct, there must be a premise that the number of bytes occupied by the members is the same. When the number of bytes occupied by the members is different, only the corresponding bytes will be overwritten. For the thread_union union, we can access these two members at the same time, as long as we can correctly obtain the addresses of the two member variables.

When a process in the kernel uses too much stack space, the kernel stack will overflow into the thread_info section, which will cause serious problems (system restart). For example, the level of recursive calls is too deep; The data structure is too large.

Figure: The relationship between thread_info task_struct in the process and the kernel stack

Let’s take a look at the structure of thread_info:

 struct thread_info {
    unsigned long flags; /* Low-level flags, */
    int preempt_count; /* 0 => preemptible, <0 => bug */
    mm_segment_t addr_limit; /* process address space */
    struct task_struct *task; /*task_struct pointer of the current process */
    struct exec_domain *exec_domain; /*Execution interval */
    __u32 cpu; /* current cpu */
    __u32 cpu_domain; /* cpu domain */
    struct cpu_context_save cpu_context; /* cpu context */
    __u32 syscall; /* syscall number */
    __u8 used_cp[16]; /* thread used copro */
    unsigned long tp_value;
     
    struct crunch_state crunchstate;
     
    union fp_state fpstate __attribute__((aligned(8)));
    union vfp_state vfpstate;
    #ifdef CONFIG_ARM_THUMBEE
    unsigned long thumbee_state; /* ThumbEE Handler Base register */
    #endif
    struct restart_block restart_block; /*Used to implement signal mechanism*/
    };

PS: (1) flag is used to save various specific process flags. The two most important ones are: TIF_SIGPENDING, which is set if the process has pending signals. TIF_NEED_RESCHED indicates that the process should need the scheduler to choose another process to replace this process for execution. .

Combining the above knowledge, see that when the kernel prints stack information, it prints the above information. The following print information is a situation encountered at work. It prints the stack information of the kernel. The PC pointer is in dev_get_by_flags. The inaccessible kernel virtual address is 45685516. Generally accessible addresses in the kernel are addresses starting with 0xCXXXXXXX. .

 Unable to handle kernel paging request at virtual address 45685516
    pgd = c65a4000
    [45685516] *pgd=00000000
    Internal error: Oops: 1 [#1]
    last sysfs file: /sys/devices/form/tpm/cfg_l3/l3_rule_add
    Modules linked in: splic mmp(P)
    CPU: 0 Tainted: P (2.6.32.11 #42)
    PC is at dev_get_by_flags + 0xfc/0x140
    LR is at dev_get_by_flags + 0xe8/0x140
    pc : [<c06bee24>] lr : [<c06bee10>] psr: 20000013
    sp: c07e9c28 ip: 00000000 fp: c07e9c64
    r10: c6bcc560 r9: c646a220 r8: c66a0000
    r7: c6a00000 r6: c0204e56 r5: 30687461 r4: 45685516
    r3 : 00000000 r2 : 00000010 r1 : c0204e56 r0 : ffffffff
    Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
    Control: 0005397f Table: 065a4000 DAC: 00000017
    Process swapper (pid: 0, stack limit = 0xc07e8270)
    Stack: (0xc07e9c28 to 0xc07ea000)
    9c20: c0204e56 c6a00000 45685516 c69ffff0 c69ffff0 c69ffff0
    9c40: c6a00000 30687461 c66a0000 c6a00000 00000007 c64b210c c07e9d24 c07e9c68
    9c60: c071f764 c06bed38 c66a0000 c66a0000 c6a00000 c6a00000 c66a0000 c6a00000
    9c80: c07e9cfc c07e9c90 c03350d4 c0334b2c 00000034 00000006 00000100 c64b2104
    9ca0: 0000c4fb c0243ece c66a0000 c0beed04 c033436c c646a220 c07e9cf4 00000000
    9cc0: c66a0000 00000003 c0bee8e8 c0beed04 c07e9d24 c07e9ce0 c06e4f5c 00004c68
    9ce0: 00000000 faa9fea9 faa9fea9 00000000 00000000 c6bcc560 c0335138 c646a220
    9d00: c66a0000 c64b2104 c085ffbc c66a0000 c0bee8e8 00000000 c07e9d54 c07e9d28
    9d20: c071f9a0 c071ebc0 00000000 c071ebb0 80000000 00000007 c67fb460 c646a220
    9d40: c0bee8c8 00000608 c07e9d94 c07e9d58 c002a100 c071f84c c0029bb8 80000000
    9d60: c07e9d84 c0beee0c c0335138 c66a0000 c646a220 00000000 c4959800 c4959800
    9d80: c67fb460 00000000 c07e9dc4 c07e9d98 c078f0f4 c0029bc8 00000000 c0029bb8
    9da0: 80000000 c07e9dbc c6b8d340 c66a0520 00000000 c646a220 c07e9dec c07e9dc8
    9dc0: c078f450 c078effc 00000000 c67fb460 c6b8d340 00000000 c67fb460 c64b20f2
    9de0: c07e9e24 c07e9df0 c078fb60 c078f130 00000000 c078f120 80000000 c0029a94
    9e00: 00000806 c6b8d340 c0bee818 00000001 00000000 c4959800 c07e9e64 c07e9e28
    9e20: c002a030 c078f804 c64b2070 00000000 c64b2078 ffc45000 c64b20c2 c085c2dc
    9e40: 00000000 c085c2c0 00000000 c0817398 00086c2e c085c2c4 c07e9e9c c07e9e68
    9e60: c06c2684 c0029bc8 00000001 00000040 00000000 c085c2dc c085c2c0 00000001
    9e80: 0000012c 00000040 c085c2d0 c0bee818 c07e9ed4 c07e9ea0 c00284e0 c06c2608
    9ea0: bf00da5c 00086c30 00000000 00000001 c097e7d4 c07e8000 00000100 c08162d8
    9ec0: 00000002 c097e7a0 c07e9f14 c07e9ed8 c00283d0 c0028478 56251311 00023c88
    9ee0: c07e9f0c 00000003 c08187ac 00000018 00000000 01000000 c07ebc70 00023cbc
    9f00: 56251311 00023c88 c07e9f24 c07e9f18 c03391e8 c0028348 c07e9f3c c07e9f28
    9f20: c0028070 c03391b0 ffffffff 0000001f c07e9f94 c07e9f40 c002d4d0 c0028010
    9f40: 00000000 00000001 c07e9f88 60000013 c07e8000 c07ebc78 c0868784 c07ebc70
    9f60: 00023cbc 56251311 00023c88 c07e9f94 c07e9f98 c07e9f88 c025c3e4 c025c3f4
    9f80: 60000013 ffffffff c07e9fb4 c07e9f98 c025c578 c025c3cc 00000000 c0981204
    9fa0: c0025ca0 c0d01140 c07e9fc4 c07e9fb8 c0032094 c025c528 c07e9ff4 c07e9fc8
    9fc0: c0008918 c0032048 c0008388 00000000 00000000 c0025ca0 00000000 00053975
    9fe0: c0868834 c00260a4 00000000 c07e9ff8 00008034 c0008708 00000000 00000000
    Backtrace:
    [<c06bed28>] (dev_get_by_flags + 0x0/0x140) from [<c071f764>] (arp_process + 0xbb4/0xc74)
    r7:c64b210c r6:00000007 r5:c6a00000 r4:c66a0000

(1) First, check which file in the kernel this stack information is printed in. In the fault.c file, the __do_kernel_fault function is printed above. Unable to handle kernel paging request at virtual address 45685516, the address is an inaccessible address in kernel space.

 static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr, struct pt_regs *regs)
    {
    /*
    * Are we prepared to handle this kernel fault?
    */
    if (fixup_exception(regs))
    return;
    /*
    * No handler, we'll have to terminate things with extreme prejudice.
    */
    bust_spinlocks(1);
    printk(KERN_ALERT
    "Unable to handle kernel %s at virtual address lx\
",
    (addr < PAGE_SIZE) ? "NULL pointer dereference" :"paging request", addr);
    show_pte(mm, addr);
    die("Oops", regs, fsr);
    bust_spinlocks(0);
    do_exit(SIGKILL);
    }

(2) The following two pieces of information are printed in the function show_pte. The following printing involves the knowledge of the page global directory and page table. It will not be analyzed for the time being and will be added later.

 pgd = c65a4000
    [45685516] *pgd=00000000
     
    void show_pte(struct mm_struct *mm, unsigned long addr)
    {
    pgd_t *pgd;
    if (!mm)
    mm = &init_mm;
    printk(KERN_ALERT "pgd = %p\
", mm->pgd);
    pgd = pgd_offset(mm, addr);
    printk(KERN_ALERT "[ lx] *pgd= lx", addr, pgd_val(*pgd));
    ……………………
    }

(3) Call the die function to obtain the address of the thread_info structure in the die function.

 struct thread_info *thread = current_thread_info();
     
    static inline struct thread_info *current_thread_info(void){
    register unsigned long sp asm ("sp");
    return (struct thread_info *)(sp & amp; ~(THREAD_SIZE - 1));
    }

Sp: 0xc07e9c28 Get the address of thread_info through current_thread_info

(0xc07e9c28 & amp; 0xffffe000) = 0xC07E8000 (the address of thread_info, which is the address at the bottom of the stack)

(4) The following printing information is printed in the __die function

 Internal error: Oops: 1 [#1]
    last sysfs file: /sys/devices/form/tpm/cfg_l2/l2_rule_add
    Modules linked in: splic mmp(P)
    CPU: 0 Tainted: P (2.6.32.11 #42)
    PC is at dev_get_by_flags + 0xfc/0x140
    LR is at dev_get_by_flags + 0xe8/0x140
    pc : [<c06bee24>] lr : [<c06bee10>] psr: 20000013
    sp: c07e9c28 ip: 00000000 fp: c07e9c64
    r10: c6bcc560 r9: c646a220 r8: c66a0000
    r7: c6a00000 r6: c0204e56 r5: 30687461 r4: 30687461
    r3 : 00000000 r2 : 00000010 r1 : c0204e56 r0 : ffffffff
    Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
    Control: 0005397f Table: 065a4000 DAC: 00000017
    Process swapper (pid: 0, stack limit = 0xc07e8270)
    Stack: (0xc07e9c28 to 0xc07ea000)

Function calling relationship: die(“Oops”, regs, fsr);—à __die(str, err, thread, regs);

The following is the definition of the __die function:

 static void __die(const char *str, int err, struct thread_info *thread, struct pt_regs *regs){
    struct task_struct *tsk = thread->task;
    static int die_counter;
    /*Internal error: Oops: 1 [#1]*/
    printk(KERN_EMERG "Internal error: %s: %x [#%d]" S_PREEMPT S_SMP "\
",
    str, err, + + die_counter);
    /*last sysfs file: /sys/devices/form/tpm/cfg_l2/l2_rule_add*/
    sysfs_printk_last_file();
    /*Module information loaded in the kernel Modules linked in: splic mmp(P) */
    print_modules();
    /*Print register information*/
    __show_regs(regs);
    /*Process swapper (pid: 0, stack limit = 0xc07e8270) tsk->comm The comm in the task_struct structure represents the name of the executable file after removing the path. The swapper here is the idle process, the process number is 0, and the kernel is created Process init; where stack limit = 0xc07e8270 points to the end address of thread_info. */
    printk(KERN_EMERG "Process %.*s (pid: %d, stack limit = 0x%p)\
",
    TASK_COMM_LEN, tsk->comm, task_pid_nr(tsk), thread + 1);
    /* The dump_mem function prints the content from the top of the stack to the current sp*/
    if (!user_mode(regs) || in_interrupt()) {
    dump_mem(KERN_EMERG, "Stack: ", regs->ARM_sp, THREAD_SIZE + (unsigned long)task_stack_page(tsk));
    dump_backtrace(regs, tsk);
    dump_instr(KERN_EMERG, regs);
    }
    }

In the above function, the pointing relationship between thread_info, task_struct, and sp is mainly used. The member stack of the task_struct structure is the bottom of the stack and is also the address of the corresponding thread_info structure. Stack data is stored downward from the bottom of the stack + 8K. SP points to the current top of the stack. (unsigned long)task_stack_page(tsk),

#define task_stack_page(task) ((task)->stack), this macro gets the bottom of the stack based on task_struct, which is the thread_info address.

#define task_thread_info(task) ((struct thread_info *)(task)->stack), this macro gets the thread_info pointer based on task_struct.

(5)dump_backtrace function

This function is used to print the calling relationship of the function. Fp is the frame pointer, used to trace the program’s method and direction tracking calling functions. This function mainly checks fp to see if backtrace can be performed. If it can, call the assembled c_backtrace, which is in the arch/arm/lib/backtrace.S function.

 static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
    {
    unsigned int fp, mode;
    int ok = 1;
    printk("Backtrace: ");
    if (!tsk)
    tsk = current;
    if (regs) {
    fp = regs->ARM_fp;
    mode = processor_mode(regs);
    } else if (tsk != current) {
    fp = thread_saved_fp(tsk);
    mode = 0x10;
    } else {
    asm("mov %0, fp" : "=r" (fp) : : "cc");
    mode = 0x10;
    }
    if (!fp) {
    printk("no frame pointer");
    OK = 0;
    } else if (verify_stack(fp)) {
    printk("invalid frame pointer 0x x", fp);
    OK = 0;
    } else if (fp < (unsigned long)end_of_stack(tsk))
    printk("frame pointer underflow");
    printk("\
");
    if (ok)
    c_backtrace(fp, mode);
    }

(6)dump_instr

According to the PC pointer and instruction mode, print out the currently executed instruction code

Code: 0a000008 e5944000 e2545000 0a000005 (e4153010)

Calling relationship of functions in the kernel

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Cloud native entry-level skills treeHomepageOverview 15742 people are learning the system