kdump+crash solves the spinlock deadlock problem

Table of Contents

1. Experiment purpose: use crash to solve the spinlock deadlock problem

2. Experimental steps

3. Brief description of soft lockup mechanism

4. Spinlock deadlock problem analysis

4.1. Find deadlock threads from the threads being executed by the CPU

4.2. Soft lookup mechanism determines deadlock threads

4.3. ps | grep RU to find the executable status thread, and use the bt command to view the stack traceback to determine the deadlock thread


Environment: arm64, Linux version 5.10.66

1. Experiment purpose: use crash to solve the spinlock deadlock problem

The experimental program is as follows. When the program is compiled into ko and loaded into the device using insmod, enter the echo kdump-3 > /proc/dbug/dump command on the serial port and then execute our test program. The test program is a typical AB-BA deadlock, waiting. After 20 seconds, soft lockup detects that scheduling has not occurred on a certain CPU for 20 seconds. (Spinlock deadlock situation), call panic to trigger kdump. After kdump generates the vmcore file, use the crash command to analyze the spinlock deadlock problem.

2. Experimental steps

Prerequisite: The device Linux supports the kdump function. For support methods, please refer to this article.

a. Compile the test module: Compile the above source code into the kdump.ko driver module, and use the insmod command in the device serial port to load this driver into the device. At this time, the /proc/dbug/kdump debug file will be generated

b. Trigger panic: Execute the echo kdump-4 > /proc/dbug/kdump command to execute the test program. The test program will execute panic to trigger the kdump mechanism, start the capture kernel, and generate the /proc/vmcore file in the capture kernel.

c. Save the vmcore file: execute cd /proc;tar -czf /tmp/3588/vmcore.tar.gz ./vmcore to compress the vmcore file under the captured kernel and copy it to a USB disk or NFS mounted directory.

d. Use crash to analyze the vmcore file: Execute the crash vmlinux vmcore command to use crash to analyze the vmcore file.

e. Since the vmcore file will only retain the code part of kdump.ko, you need to use mod to load the debugging and symbol information of the kdump.ko driver module in crash. Executing the crash> dis -l kdump_proc_write command in this way will correctly display the line number information corresponding to the assembly code.

mod -s kdump /kdump/demo/stack/kdump.ko

3. Brief description of soft lockup mechanism

The function of the soft lockup mechanism is to detect whether scheduling occurs on the CPU. If scheduling does not occur on a certain CPU within 20 seconds, it is considered an abnormality.
a. Core idea of soft lockup

Start a kernel thread (watchdog/x) for each CPU. This thread is the real-time thread with the highest priority. When the thread is scheduled, the corresponding count (timestamp) will be updated and the timer will be started. When the timer Check the corresponding timestamp when it expires. If there is no update after the specified time (default is 20s), it means that no scheduling has occurred during this period (because this thread has the highest priority), then the corresponding alarm will be printed or you can enter according to the configuration panic process.

b. Possible reasons for triggering the soft lockup mechanism

(1) On Linux without kernel preemption, a kernel thread enters an infinite loop. (CONFIG_PREEMPT=y)

(2) The thread waits for the spin lock and the waiting time exceeds 20 seconds.

(3) There are too many internal loops in the thread, and the program execution time exceeds 20 seconds.

C, proc interface of soft lockup mechanism

/proc/sys/kernel/soft_watchdog /* soft lockup switch */
/proc/sys/kernel/softlockup_all_cpu_backtrace
/proc/sys/kernel/softlockup_panic
/proc/sys/kernel/sysctl_writes_strict

4. Analysis of spinlock deadlock problem

To solve the deadlock problem of spin locks, you only need to find the thread holding the lock and waiting for the lock. According to the characteristics of the spin lock, the thread waiting for the lock will enter an infinite loop and spin waiting for the lock to be released. Therefore, holding the lock and waiting The thread of the lock must be in the executable state and occupy the CPU and is running. The executable state (RU: R (TASK_RUNNING)) does not mean that the process must be running, and it can also be in the running queue;Here are 3 ideas Find deadlock threads.

4.1. Find deadlock threads from the threads being executed by the CPU

The spinlock lock is an infinite spin loop detection, so the thread waiting for the lock will always occupy the CPU. Just use the percpu variable runqueues to find out what thread each CPU is running.

a. Definition of percpu variable runqueues

(1) struct rq runqueues definition

X:\work\linux\linux-6.0\kernel\sched\core.c
DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);

struct rq {
/* Three different runqueues for use by three different schedulers */
struct cfs_rq cfs;
struct sched_entity *curr; /* Scheduling entity of the currently running process struct task_struct *curtask->se */
struct rt_rq rt;
struct dl_rq dl;
/* Record the thread currently being executed by the CPU */
struct task_struct __rcu *curr;
}
(2) struct task_struct __rcu *curr member assignment
Assign values through the following functions:

static inline int task_current(struct rq *rq, struct task_struct *p)
{
    return rq->curr == p;
}

static inline int task_running(struct rq *rq, struct task_struct *p)
{
#ifdef CONFIG_SMP
    return p->on_cpu;
#else
    return task_current(rq, p);
#endif
}

(3) A brief explanation of curr in the struct cfs_rq cfs member
struct cfs_rq cfs;
struct sched_entity *curr; /* Scheduling entity of the currently running process struct task_struct *curtask->se */

struct cfs_rq {
    /* The ready queue of the cfs scheduling algorithm, threads that meet the execution conditions will be linked to this red-black tree */
    struct rb_root_cached tasks_timeline;
    /* Scheduling entity of the currently running process struct task_struct *curtask->se */
    struct sched_entity *curr;
    
};

/* Insert the scheduling entity into the red-black tree. The root of the red-black tree is cfs_rq->tasks_timeline */
static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
    rb_add_cached( & amp;se->run_node, & amp;cfs_rq->tasks_timeline, __entity_less);
}

b. Use the global percpu variable runqueues to parse out the threads that each CPU is running

According to the definition of the global percpu runqueues structure, the member cur represents the address of the thread structure that this CPU is running, that is, the address of the struct task_struct structure. Use the p command to view the address of the thread running on each CPU. The yellow part is the cpu number, and the red part is the struct task_struct structure corresponding to the thread running on the cpu. body address.

crash> p runqueues:0 |grep curr
curr = 0x0,
curr = 100,
cur = 0,
curr = 0xffffffc011da24c0 ,
crash>
crash> p runqueues:1 |grep curr
curr = 0x0,
curr = 100,
cur = 0,
curr = 0xffffff8100856c00,
crash>
crash> p runqueues:2 |grep curr
curr = 0xffffff81011792c0,
curr = 100,
cur = 0,
curr = 0xffffff8101179200,
crash> p runqueues:3 |grep curr
curr = 0xffffff810117a4c0,
curr = 100,
cur = 0,
curr = 0xffffff810117a400,
crash>
crash> p runqueues:4 |grep curr
curr = 0x0,
curr = 100,
cur = 0,
curr = 0xffffff810089a400,
crash> p runqueues:5 |grep curr
curr = 0x0,
curr = 100,
cur = 0,
curr = 0xffffff810089b600,
crash> p runqueues:6 |grep curr
curr = 0x0,
curr = 100,
cur = 0,
curr = 0xffffff810089c800,
crash> p runqueues:7 |grep curr
curr = 0x0,
curr = 100,
cur = 0,
curr = 0xffffff810089da00,

Use the struct command to print the name, pid and other information of the thread running on each CPU.

crash> struct task_struct 0xffffffc011da24c0 |grep -E ” comm|pid =”
pid = 0,
thread_pid = 0xffffffc011db0f10 ,
comm = “swapper/0\000\000\000\000\000\000″,
crash>
crash> struct task_struct 0xffffff8100856c00 |grep -E ” comm|pid =”
pid = 0,
thread_pid = 0xffffffc011db0f10 ,
comm = “swapper/1\000\000\000\000\000\000″,
crash>
crash> struct task_struct 0xffffff8101179200 |grep -E ” comm|pid =”
pid = 1004,
thread_pid = 0xffffff8105260b40,
comm = “spin thread1\000\000\000″,
crash>
crash> struct task_struct 0xffffff810117a400 |grep -E ” comm|pid =”
pid = 1005,
thread_pid = 0xffffff8105260c00,
comm = “spin thread2\000\000\000″,
crash>
crash> struct task_struct 0xffffff810089a400 |grep -E ” comm|pid =”
pid = 0,
thread_pid = 0xffffffc011db0f10 ,
comm = “swapper/4\000\000\000\000\000\000″,
crash>
crash> struct task_struct 0xffffff810089b600 |grep -E ” comm|pid =”
pid = 0,
thread_pid = 0xffffffc011db0f10 ,
comm = “swapper/5\000\000\000\000\000\000″,
crash>
crash> struct task_struct 0xffffff810089c800 |grep -E ” comm|pid =”
pid = 0,
thread_pid = 0xffffffc011db0f10 ,
comm = “swapper/6\000\000\000\000\000\000″,
crash>
crash> struct task_struct 0xffffff810089da00 |grep -E ” comm|pid =”
pid = 0,
thread_pid = 0xffffffc011db0f10 ,
comm = “swapper/7\000\000\000\000\000\000″,
crash>

The c and bt commands print the stack traceback of the thread running on the CPU to determine the deadlock thread

/* Look at the red part to confirm that the spinlock_thread1 thread is occupying the cpu execution because of waiting for the spin lock */

crash> bt 1004
PID: 1004 TASK: ffffff8101179200 CPU: 2 COMMAND: “spin thread1”
#0 [ffffffc01217bb50] __crash_kexec at ffffffc0100d9244
#1 [ffffffc01217bce0] panic at ffffffc010e847f4
#2 [ffffffc01217bdc0] watchdog_timer_fn at ffffffc0100efe24
#3 [ffffffc01217be10] __hrtimer_run_queues at ffffffc0100b97ac
#4 [ffffffc01217be90] hrtimer_run_queues at ffffffc0100ba110
#5 [ffffffc01217bec0] run_local_timers at ffffffc0100b8b14
#6 [ffffffc01217bee0] update_process_times at ffffffc0100b8bb8
#7 [ffffffc01217bf10] tick_nohz_handler at ffffffc0100c9954
#8 [ffffffc01217bf40] arch_timer_handler_phys at ffffffc010af2a50
#9 [ffffffc01217bf50] handle_percpu_devid_irq at ffffffc01009a9c0
#10 [ffffffc01217bf90] generic_handle_irq_desc at ffffffc010093fe4
#11 [ffffffc01217bfa0] __handle_domain_irq at ffffffc010094a28
#12 [ffffffc01217bfd0] gic_handle_irq at ffffffc010000994

#13 [ffffffc013c0bd90] el1_irq at ffffffc010002548
#14 [ffffffc013c0bdb0] __cmpwait_case_32 at ffffffc01008809c
#15 [ffffffc013c0be00] do_raw_spin_lock at ffffffc010089a04
#16 [ffffffc013c0be30] _raw_spin_lock at ffffffc010eb34e0
#17 [ffffffc013c0be40] spinlock_thread1 at ffffffc008f0b100 [kdump]
#18 [ffffffc013c0be60] kthread at ffffffc010058270
crash>
/* Normal scheduling idel thread status, has nothing to do with spin lock */
crash> bt 0
PID: 0 TASK: ffffffc011da24c0

CPU: 0 COMMAND: “swapper/0”
bt: WARNING: cannot determine starting stack frame for task ffffffc011da24c0

PID: 0 TASK: ffffff8100856c00 CPU: 1 COMMAND: “swapper/1”
bt: WARNING: cannot determine starting stack frame for task ffffff8100856c00

PID: 0 TASK: ffffff8100898000 CPU: 2 COMMAND: “swapper/2”
#0 [ffffffc01236be80] __switch_to at ffffffc010007e88
#1 [ffffffc01236beb0] __schedule at ffffffc010eafa54
#2 [ffffffc01236bf30] schedule_idle at ffffffc010eaff5c
#3 [ffffffc01236bf50] do_idle at ffffffc01006bea4
#4 [ffffffc01236bfb0] cpu_startup_entry at ffffffc01006c210
#5 [ffffffc01236bfd0] secondary_start_kernel at ffffffc010015934

4.2. The soft lookup mechanism determines the deadlock thread

According to the red print below, the following conclusions can be confirmed:

The spin thread1 thread executed above CPU#3 has occupied the CPU for 22 seconds.

The spin thread2 thread executed on CPU#5 has occupied the CPU for 22 seconds, so these two threads may be deadlocked.

/mnt/mtd # echo kdump-4 > /proc/dbug/kdump
/mnt/mtd # 23] open embedsky board device!
[728.437843] printk kbuf kdump-4
[728.437843]
[728.437926] spinlock_t lock name spin thread1 time = 1
[728.437980] spinlock_t lock name spin thread2 time = 1
[ 754.280911] watchdog: BUG: soft lockup – CPU#3 stuck for 22s! [ 754.280919] Modules linked in: kdump(O) r8168(O) r8125(O) i2c_common(O) avs(O) 8811au(O) 8812au( O) netctrl(O) GobiNet(O) driverbox(O) dgs(O) license(O) osa(O) fwcrypto(PO) crypto(O)
[754.280940] CPU: 3 PID: 6737 Comm: spin thread1 Kdump: loaded Tainted: P O 5.10.66 #2
[754.280942] Hardware name: Rockchip RK3588 NVR DEMO LP4 V10 Board (DT)
[754.280946] pstate: 00c00009 (nzcv daif + PAN + UAO -TCO BTYPE=–)
[754.280953] pc: __cmpwait_case_32 + 0x1c/0x24
[754.281016] Call trace:
[754.281020] __cmpwait_case_32 + 0x1c/0x24
[754.281022] do_raw_spin_lock + 0xdc/0xe0
[754.281027] _raw_spin_lock + 0x1c/0x28
[754.281032] spinlock_thread1 + 0x60/0x88 [kdump]
[754.281036] kthread + 0xf8/0x108
[754.281040] ret_from_fork + 0x10/0x18

[754.284281] Kernel panic – not syncing: softlockup: hung tasks
[754.300914] watchdog: BUG: soft lockup – CPU#5 stuck for 22s! [spin thread2:6738]
[757.696838] CPU: 3 PID: 6737 Comm: spin thread1 Kdump: loaded Tainted: P O L 5.10.66 #2
[757.850248] CPU: 5 PID: 6738 Comm: spin thread2 Kdump: loaded Tainted: P O L 5.10.66 #2
[757.859369] Hardware name: Rockchip RK3588 NVR DEMO LP4 V10 Board (DT)
[757.865891] pstate: 00c00009 (nzcv daif + PAN + UAO -TCO BTYPE=–)
[757.964261] Call trace:
[757.966711] __cmpwait_case_32 + 0x1c/0x24
[757.970635] do_raw_spin_lock + 0xdc/0xe0
[757.974468] _raw_spin_lock + 0x1c/0x28
[757.978128] spinlock_thread2 + 0x5c/0x84 [kdump]
[757.982657] kthread + 0xf8/0x108
[757.985793] ret_from_fork + 0x10/0x18

4.3. ps |grep RU Find the executable status thread, and use the bt command to view the stack traceback to confirm Deadlock thread

This method is relatively inefficient, but I will introduce it in more detail.

crash> ps |grep RU
> 0 0 0 ffffffc011da24c0 RU 0.0 0 0 [swapper/0]
> 0 0 1 ffffff8100856c00 RU 0.0 0 0 [swapper/1]
0 0 2 ffffff8100898000 RU 0.0 0 0 [swapper/2]
0 0 3 ffffff8100899200 RU 0.0 0 0 [swapper/3]
> 0 0 4 ffffff810089a400 RU 0.0 0 0 [swapper/4]
> 0 0 5 ffffff810089b600 RU 0.0 0 0 [swapper/5]
> 0 0 6 ffffff810089c800 RU 0.0 0 0 [swapper/6]
> 0 0 7 ffffff810089da00 RU 0.0 0 0 [swapper/7]
21 2 2 ffffff81008aec00 RU 0.0 0 0 [migration/2]
22 2 2 ffffff81008f0000 RU 0.0 0 0 [ksoftirqd/2]
26 2 3 ffffff81008f4800 RU 0.0 0 0 [migration/3]
54 2 3 ffffff81009b4800 RU 0.0 0 0 [kworker/3:1]
225 2 2 ffffff810326da00 RU 0.0 0 0 [kworker/2:3]
> 1004 2 2 ffffff8101179200 RU 0.0 0 0 [spin thread1]
> 1005 2 3 ffffff810117a400 RU 0.0 0 0 [spin thread2]
/* Spin lock wait */
crash> bt 1004
PID: 1004 TASK: ffffff8101179200 CPU: 2 COMMAND: “spin thread1”
#0 [ffffffc01217bb50] __crash_kexec at ffffffc0100d9244
#1 [ffffffc01217bce0] panic at ffffffc010e847f4
#2 [ffffffc01217bdc0] watchdog_timer_fn at ffffffc0100efe24
#3 [ffffffc01217be10] __hrtimer_run_queues at ffffffc0100b97ac
#4 [ffffffc01217be90] hrtimer_run_queues at ffffffc0100ba110
#5 [ffffffc01217bec0] run_local_timers at ffffffc0100b8b14
#6 [ffffffc01217bee0] update_process_times at ffffffc0100b8bb8
#7 [ffffffc01217bf10] tick_nohz_handler at ffffffc0100c9954
#8 [ffffffc01217bf40] arch_timer_handler_phys at ffffffc010af2a50
#9 [ffffffc01217bf50] handle_percpu_devid_irq at ffffffc01009a9c0
#10 [ffffffc01217bf90] generic_handle_irq_desc at ffffffc010093fe4
#11 [ffffffc01217bfa0] __handle_domain_irq at ffffffc010094a28
#12 [ffffffc01217bfd0] gic_handle_irq at ffffffc010000994

#13 [ffffffc013c0bd90] el1_irq at ffffffc010002548
#14 [ffffffc013c0bdb0] __cmpwait_case_32 at ffffffc01008809c
#15 [ffffffc013c0be00] do_raw_spin_lock at ffffffc010089a04
#16 [ffffffc013c0be30] _raw_spin_lock at ffffffc010eb34e0
#17 [ffffffc013c0be40] spinlock_thread1 at ffffffc008f0b100 [kdump]
#18 [ffffffc013c0be60] kthread at ffffffc010058270
/* Normal scheduling execution */
crash> bt 22
PID: 22 TASK: ffffff81008f0000 CPU: 2 COMMAND: “ksoftirqd/2”
#0 [ffffffc01243bd30] __switch_to at ffffffc010007e88
#1 [ffffffc01243bd60] __schedule at ffffffc010eafa54
#2 [ffffffc01243bde0] schedule at ffffffc010eafd20
#3 [ffffffc01243be00] smpboot_thread_fn at ffffffc01005e8f4
#4 [ffffffc01243be60] kthread at ffffffc010058270
crash>

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. CS entry skill treeLinux introductionFirst introduction to Linux37345 people are learning the system