Linux driver development HTR3218 project BUG (2): Kernel crash caused by memcpy

Copyright statement: This article is an original article by the blogger and follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement when reprinting.
Link to this article: https://blog.csdn.net/xi_xix_i/article/details/134030023

Directory

    • 1. Memcpy causes kernel crash problem

1. Kernel crash caused by memcpy

I wrote a driver in the project, and then when I insmod .ko, it caused the kernel to crash (it was really annoying, and it couldn’t restart, so I had to cut off the power and restart). The information reported after the crash is as follows


According to the values of the pc and lr registers, it is found that the program runs to 0x50 of htr3218_write_regs
First, based on this information, it can be determined that the error location is in the function htr3218_write_regs(). Now paste the code of this function in the driver.

static s32 htr3218_write_regs(struct htr3218_i2c_device *dev, u8 reg,
                              u8* buf, int len)
{<!-- -->
    int ret;
    u8 b[256];
    struct i2c_msg msg;
    struct i2c_client *client = dev->client;

    printk("in_write_regs, \r\\
");
    b[0] = reg;

    memcpy( & amp;b[1], buf, len);

    msg.addr = client->addr; /* htr3218 address */
    msg.flags = 0; /* 0 means sending data */
    msg.buf = & amp;b[0];
    msg.len = len + 1;
    // msg.len = 1;
    ret = i2c_transfer(client->adapter, & amp;msg, 1);

    printk("transfer ret: %d\r\\
", ret);
    printk("device addr: 0x%x, reg: 0x%x ,val: 0x%x\r\\
", client->addr, reg, buf[0]);
    // ret = i2c_smbus_write_byte_data(client, reg, reg_data);
    // printk("transfer ret: %d\r\\
", ret);
    return ret;
}

If the code is not long, you can print information after each line of code to determine where the error function is, but this method will not work for longer function codes. Of course, this code is not long, but in the spirit of learning, I still use other methods to find the error code. Because the driver is written on the HiSilicon platform, use the HiSilicon tool chain to disassemble the .o file and use the following command to output the disassembled assembly code into a txt file:
/opt/linux/x86-arm/aarch64-mix210-linux/bin/aarch64-mix210-linux-objdump -D htrtest.o > htrtest_dump.txt
Get the assembly code and paste the assembly code of the function:

0000000000000028 <htr3218_write_regs.constprop.1>:
  28: a9aa7bfd stp x29, x30, [sp, #-352]!
  2c: 90000002 adrp x2, 0 <htr3218_open_ops>
  30: 91000042 add x2, x2, #0x0
  34: 910003fd mov x29, sp
  38: a9025bf5 stp x21, x22, [sp, #32]
  3c: 12001c16 and w22, w0, #0xff
  40: a90153f3 stp x19, x20, [sp, #16]
  44: 90000013 adrp x19, 0 <__stack_chk_guard>
  48: f9001bf7 str x23, [sp, #48]
  4c: 91000273 add x19, x19, #0x0
  50: aa0103f7 mov x23, x1
  54: f9400260 ldr x0, [x19]
  58: f900afa0 str x0, [x29, #344]
  5c: d2800000 mov x0, #0x0 // #0
  60: f840c054 ldur x20, [x2, #12]
  64: 90000000 adrp x0, 0 <htr3218_open_ops>
  68: 91000000 add x0, x0, #0x0
  6c: 94000000bl 0 <printk>
  70: 390163b6 strb w22, [x29, #88]
  74: 52800022 mov w2, #0x1 // #1
  78: 394002e4 ldrb w4, [x23]
  7c: 910123a1 add x1, x29, #0x48
  80: 79400683 ldrh w3, [x20, #2]
  84: f9400e80 ldr x0, [x20, #24]
  88: 390167a4 strb w4, [x29, #89]
  8c: 910163a4 add x4, x29, #0x58
  90: 790093a3 strh w3, [x29, #72]
  94: 52a00043 mov w3, #0x20000 // #131072
  98: f9002ba4 str x4, [x29, #80]
  9c: b804a3a3 stur w3, [x29, #74]
  a0: 94000000 bl 0 <i2c_transfer>
  a4: 2a0003e1 mov w1, w0
  a8: 2a0003f5 mov w21, w0
  ac: 90000000 adrp x0, 0 <htr3218_open_ops>
  b0: 91000000 add x0, x0, #0x0
  b4: 94000000bl 0 <printk>
  b8: 79400681 ldrh w1, [x20, #2]
  bc: 2a1603e2 mov w2, w22
  c0: 394002e3 ldrb w3, [x23]
  c4: 90000000 adrp x0, 0 <htr3218_open_ops>
  c8: 91000000 add x0, x0, #0x0
  cc: 94000000bl 0 <printk>
  d0: f940afa2 ldr x2, [x29, #344]
  d4: f9400261 ldr x1, [x19]
  d8: ca010041 eor x1, x2, x1
  dc: b50000e1 cbnz x1, f8 <htr3218_write_regs.constprop.1 + 0xd0>
  e0: 2a1503e0 mov w0, w21
  e4: f9401bf7 ldr x23, [sp, #48]
  e8: a94153f3 ldp x19, x20, [sp, #16]
  ec: a9425bf5 ldp x21, x22, [sp, #32]
  f0: a8d67bfd ldp x29, x30, [sp], #352
  f4: d65f03c0 ret
  f8: 94000000 bl 0 <__stack_chk_fail>
  fc: d503201f nop

According to the information prompted when the kernel crashes, find the instruction at htr3218_write_regs.constprop.1 + 0x50 (the actual instruction address is the function address 0x28 + 0x50=0x78), which is the instruction executed by the CPU when the error occurs. As you can see, the instruction here is 78: 394002e4 ldrb w4, [x23], and the byte data at the memory address x23 is read into the register w4, and clear the high 24 bits of w4 (Hisilicon platform is a 64-bit arm, but according to this article, the 64-bit general register x0-x30When used as 32-bit, it is w0-w30. I didn’t delve into the specific details. I just started to wonder why the LDRB instruction, which is a 64-bit register, only clears the high 24 bits to 0. Subsequent You can take a look).

Then continue to look for operations on register x23 in the function, and find that instructions with addresses 0x50 and 0x48 operate on register x23. The instruction 0x48 just pushes the value of x23 onto the stack, so the focus is on the instruction 0x50 mov x23, x1.

However, there is no operation on the x1 register in the htr3218_write_regs function, and the first few registers starting with x0 are often passed as parameters when calling the function. used, so this should be a certain parameter passed in. Then find the location of the calling function htr3218_write_regs() in the assembly code (in the driver, it is called under the htr3218_module_init() function, so I will not paste all the C language code. ), found the instruction 1a4: d2800001 mov x1, #0x0, so this parameter caused the error in the instruction in htr3218_write_regs. And we know that the value passed in this parameter is 0x0.

0000000000000100 <htr3218_module_init>:
 100: a9bb7bfd stp x29, x30, [sp, #-80]!
 104: 910003fd mov x29, sp
 108: a90153f3 stp x19, x20, [sp, #16]
 10c: 90000014 adrp x20, 0 <__stack_chk_guard>
...

 18c: 90000000 adrp x0, 0 <htr3218_open_ops>
 190: 91000000 add x0, x0, #0x0
 194: 52800038 mov w24, #0x1 // #1
 198: 910002d6 add x22, x22, #0x0
 19c: 910142d6 add x22, x22, #0x50
 1a0: 94000000bl 0 <printk>
 1a4: d2800001 mov x1, #0x0 // #0
 1a8: 528009e0 mov w0, #0x4f // #79
 1ac: 97ffff9f bl 28 <htr3218_write_regs.constprop.1>
...
 2e0: 17ffffa9 b 184 <htr3218_module_init + 0x84>
 2e4: d503201f nop

Comparing some of the init functions in the driver below, we found that 0x0 was indeed passed in when calling the htr3218_write_regs() function.

int htr3218_module_init(void) /* Executed when modprobe .ko*/
{<!-- -->
    int ret = 0;
    float a = 10*0.01;
    u8 buf[1];
    u8 buf_single = 0;
    printk("in module_init!\r\\
");
....
    /* *********************************************i2c device related Initialization****************************************** */
    htr3218_data.device.max_channel = HTR3218_USED_CHANNEL_NUM;
    htr3218_data.bus_adapter = NULL;

    ret = htr3218_i2c_init( & amp;htr3218_data); /* This function initializes bus_adpater and device.client */
    if (!ret)
    {<!-- -->
        printk("Init htr3218 i2c client fail %d\\
", ret);
        goto i2c_error;
    }

    printk("device addr:0x%x \r\\
", htr3218_data.device.client->addr);

    htr3218_write_regs( & amp;htr3218_data.device, 0x4f, 0x0, 1);

....
}

According to the definition of this function htr3218_write_regs(struct htr3218_i2c_device *dev, u8 reg, u8* buf, int len), the third parameter is u8* buf, and the buf parameter is used in the driver The code is

memcpy( & amp;b[1], buf, len);

So the problem should be here. In fact, the problem has been discovered at this time. When calling htr3218_write_regs, you should pass in an address, and then pass the value in the address through memcpyTake it out, I was confused when I called htr3218_write_regs and passed the value directly, causing memcpy to find the value of the address 0x0 , this is definitely an illegal address, so it causes the kernel to crash. In fact, the first line of the kernel crash message also says:

uable to handle kernel NULL pointer dereference at virtual address 000000000000000

But for the sake of rigor, let’s add printing information to determine whether it is a problem with this line of code:

printk("**********************befor_memcpy************************ ** \r\\
");
memcpy( & amp;b[1], buf, len);
printk("************************after_memcpy************************\ \r\\
");

Then compile and insmod the driver once. Before_memcpy is indeed printed, but after_memcpy is not printed (you have to restart again…)