ARM Cortex-M stack structure and backtracking

1. Overview

Recently, I am studying the stack structure and stack backtracking of ARM Cortex-M series microcontrollers. What is the use of studying this? There are the following aspects:

An in-depth understanding of processor instructions, the principles of program operation, etc., is helpful to the improvement of programming skills.
When there is a problem with your program, you can find the problem point based on the stack data. Contribute to the improvement of the ability to find and locate problems.

My purpose is to figure out what is the order in which the program pushes the stack when calling functions in different states? How to open up local variables in the stack? In order to understand these two issues, I consulted a lot of information and used many tools.

2. ELF files and related tools

Before figuring out the issues we care about, we need to prepare some necessary knowledge and some necessary tools to help us explore the issues we care about, among which ELF files and tools related to operating ELF files are essential.

2.1 ELF files

First of all, we need to briefly understand the ELF file. We use keil to develop the ARM Cortex-M single-chip microcomputer program, and the suffix of the executable file output after compiling is axf. In fact, the axf file is a type of ELF file. In fact, the executable files output by most compilers are ELF files, including windows executable files and Linux executable files.

The ELF file stores the program’s binary instruction code, symbol table, debugging information, etc. in a certain format. We can get almost all the information of the program through the ELF file.

2.2 ELF file parsing tool

There are many ELF parsing tools on the Internet, but I prefer to use scripts to handle some things by myself. So I use python to marry the elftoos library (eliben/pyelftools: Parsing ELF and DWARF in Python (github.com)) to process and parse ELF files. readelf is also a good tool under Linux.

The script library has many useful examples, and common functions can be obtained from examples. We make the necessary changes on the basis of the example to achieve the function we want. For example, we need to find the function of the function name through the program address.

The library can be installed with the following command:

pip install pyelftools

3. Stack changes when calling a function

The ARM Cortex-M processor does not have a stack boundary register, and the number of registers pushed to the stack in order to save system resources also changes with the implementation of the function. This also further increases the difficulty of such processor stack backtracking. Even so, once we master the law of pushing the stack when the function is called, we can realize stack backtracking and analysis.

Disclaimer: Since the assembly instructions compiled by the same C language source code will be different under different compilers, the assembly instructions shown in this article all use the ARMCC compiler.
The analysis here focuses on bare-metal programs without an operating system. In the case of an operating system, there may be multiple stack spaces.

Let’s analyze the assembly code of several functions to summarize the rules of function stacking:

We mainly focus on the PUSH instruction and the operation on SP

uint32_t nrf_atomic_flag_set(nrf_atomic_flag_t * p_data)
nrf_atomic_flag_set
        0x000073fe: b510 .. PUSH {r4,lr}
        0x00007400: 4604 .F MOV r4,r0
        0x00007402: 2101 .!MOVS r1,#1
        0x00007404: 4620 F MOV r0,r4
        0x00007406: f000f822 ..". BL nrf_atomic_u32_or ; 0x744e
        0x0000740a: bd10 .. POP {r4,pc}

This is a function with parameters and a return value, and no local variables are used inside the function. This type of function pushes only two registers onto the stack.

uint32_t nrf_atomic_u32_add(nrf_atomic_u32_t * p_data, uint32_t value)
{<!-- -->
    uint32_t old_val;
    uint32_t new_val;

    NRF_ATOMIC_OP(add, old_val, new_val, p_data, value);
    UNUSED_PARAMETER(old_val);
    UNUSED_PARAMETER(new_val);
    return new_val;
}

nrf_atomic_u32_add
        0x0000740c: b5f8 .. PUSH {r3-r7,lr}
        0x0000740e: 4604 .F MOV r4,r0
        0x00007410: 460d .F MOV r5,r1
        0x00007412: 466a jF MOV r2,sp
        0x00007414: 4629 )F MOV r1,r5
        0x00007416: 4620 F MOV r0,r4
        0x00007418: f7f8ff31 ..1. BL __asm___12_nrf_atomic_c_85ca2469__nrf_atomic_internal_add ; 0x27e
        0x0000741c: 4606 .F MOV r6,r0
        0x0000741e: 9800 .. LDR r0,[sp,#0]
        0x00007420: bdf8 .. POP {r3-r7,pc}

This function has parameters and a return value, and has two local variables. We found that more registers were pushed onto the stack. Therefore, some registers are used as local variables.

nrf_cli_fprintf
        0x00007f6c: b40f .. PUSH {r0-r3}
        0x00007f6e: b57c |. PUSH {r2-r6,lr}
        0x00007f70: 4604 .F MOV r4,r0
        0x00007f72: 460d .F MOV r5,r1
        0x00007f74: 9808 .. LDR r0,[sp,#0x20]
        0x00007f76: b920 . CBNZ r0,0x7f82 ; nrf_cli_fprintf + 22
        0x00007f78: a120 . ADR r1,{pc} + 0x84 ; 0x7ffc
        0x00007f7a: f640305f @._0 MOV r0,#0xb5f

May push the stack multiple times.

nrf_cli_help_print
        0x00008024: e92d4ff0 -..O PUSH {r4-r11,lr}
        0x00008028: b089 .. SUB sp,sp,#0x24
        0x0000802a: 4606 .F MOV r6,r0
        0x0000802c: 460c .F MOV r4,r1
        0x0000802e: 4691 .F MOV r9,r2
        0x00008030: b926 & amp;. CBNZ r6, 0x803c ; nrf_cli_help_print + 24
        0x00008032: a1b6 .. ADR r1,{pc} + 0x2da ; 0x830c
        0x00008034: f44f603e O.>` MOV r0,#0xbe0
        0x00008038: f7fdf92e .... BL assert_nrf_callback ; 0x5298
        0x0000803c: 68b0 .h LDR r0,[r6,#8]
        0x0000803e: b118 .. CBZ r0,0x8048 ; nrf_cli_help_print + 36

When the space occupied by local variables is relatively large and the system registers cannot meet the requirements, the SUB instruction will be used to subtract the stack pointer. Its role is to open up stack space for local variables. This function opens up a local variable space of 0x24 bytes.

Now let’s summarize, generally after jumping to the function address, the PUSH instruction will be used to push the stack. Here, the value of the register is generally saved by pushing the stack, and the last value of the stack is generally the value of lr. The number of registers pushed into the stack is uncertain, which is determined by the size of the parameters, return values, and local variables of the function. For functions with a relatively large space occupied by local variables, the registers cannot meet the implementation requirements of the function, and the compiler will use the SUB instruction to perform subtraction operations on SP to open up stack space for use by local variables.

If we want to obtain the hierarchical relationship of function calls through stack data, we only need to pay attention to the value of lr in the stack.

4. Interrupt handler

This is a special case if the current CPU is executing an interrupt handler. Because the ARM Cortex-M processor will automatically push R0 <- R1 <- R2 <- R3 <- R12 <- LR <- PC <- xPSR onto the stack before entering the interrupt handler. Special handling is required in this case.

5. Implement stack backtracking through python script

Python has a library that can control jlink to debug the microcontroller, which can easily read the value of the system register and stack data. It can also control the microcontroller. Based on this, a python script for stack backtracking is implemented.

From the exploration in the previous chapters, we found that by analyzing the assembly source code of the firmware and combining the value of the current register, the stack backtracking of the ARM Cortex-M series MCU can be realized. The backtracking process is divided into the following steps:

1. Read the current value of sp, pc, lr, xPSR
2. Obtain the function name through the ELF tool through the value of pc
3. Retrieve the estimated asm file by the function name to get the stack information
4. Obtain the address of lr saved in the stack by pushing the stack information combined with the current value of sp, and update the value of sp
5. Repeat steps 2-5 until the bottom of the stack is traversed

Here is the python implementation:

from elftools.elf.elffile import ELFFile
from elftools.dwarf.descriptions import describe_form_class
import subprocess
import argparse
import pylink
import json
import sys
import os
import re

VERSION = '1.0.0'

def decode_funcname(dwarfinfo, address):
    # Go over all DIEs in the DWARF information, looking for a subprogram
    # entry with an address range that includes the given address.
    # this simplifies things by disregarding subprograms that may have
    # split address ranges.
    for CUs in dwarfinfo.iter_CUs():
        for DIE in CU.iter_DIEs():
            try:
                if DIE.tag == 'DW_TAG_subprogram':
                    lowpc = DIE.attributes['DW_AT_low_pc'].value

                    # DWARF v4 in section 2.17 describes how to interpret the
                    # DW_AT_high_pc attribute based on the class of its form.
                    # For class 'address' it's taken as an absolute address
                    # (similarly to DW_AT_low_pc); for class 'constant', it's
                    # an offset from DW_AT_low_pc.
                    highpc_attr = DIE.attributes['DW_AT_high_pc']
                    highpc_attr_class = describe_form_class(highpc_attr.form)
                    if highpc_attr_class == 'address':
                        highpc = highpc_attr.value
                    elif highpc_attr_class == 'constant':
                        highpc = lowpc + highpc_attr.value
                    else:
                        print('Error: invalid DW_AT_high_pc class:',
                              highpc_attr_class)
                        continue

                    if lowpc <= address < highpc:
                        return DIE.attributes['DW_AT_name'].value
            except KeyError:
                continue
    return None

def decode_file_line(dwarfinfo, address):
    # Go over all the line programs in the DWARF information, looking for
    # one that describes the given address.
    for CUs in dwarfinfo.iter_CUs():
        # First, look at line programs to find the file/line for the address
        lineprog = dwarfinfo.line_program_for_CU(CU)
        if lineprog is None:
            continue

        delta = 1 if lineprog.header.version < 5 else 0
        prevstate = None
        for entry in lineprog. get_entries():
            # We're interested in those entries where a new state is assigned
            if entry.state is None:
                continue
            # Looking for a range of addresses in two consecutive states that
            # contain the required address.
            if prevstate and prevstate.address <= address < entry.state.address:
                filename = lineprog['file_entry'][prevstate.file - delta].name
                line = prevstate.line
                return filename, line
            if entry.state.end_sequence:
                # For the state with `end_sequence`, `address` means the address
                # of the first byte after the target machine instruction
                # sequence and other information is meaningless. We clear
                # prevstate so that it's not used in the next iteration. Address
                # info is used in the above comparison to see if we need to use
                # the line information for the prevstate.
                prevstate = None
            else:
                prevstate = entry.state
    return None, None

def get_function_info_by_addr(dwarfinfo, addr, path):
    '''
    Obtain the function name, source code file path and function name through the address. And print keil can jump
    string into the Build Output window
    '''
    patten = re.compile('^[A-Fa-f0-9] + $')
    patten0x = re.compile('0x[0-9a-fA-F] + ')
        
    if patten.match(addr) or patten0x.match(addr):
        funcname = decode_funcname(dwarfinfo, int(addr, base=16))
        file, line = decode_file_line(dwarfinfo, int(addr, base=16))
        filename = file. decode()
    
        # print(file.decode(), '(', line, ')', ':', funcname.decode())
        if path == None or path == '':
            print(filename, '(', line, ')', ':', funcname.decode())
        else:
            # Find the file in the input path and output the result
            result = []
            for root, lists, files in os. walk(path):
                for file in files:
                    if filename in file:
                        write = os.path.join(root, file)
                        # Filter the file suffix, only keep the C language and c ++ related files
                        if write.endswith(tuple(['c', 'cc', 'cpp', 'h'])):
                            result.append(write)

            for subpath in result:
                print(subpath, '({:d})'. format(line), ':', funcname. decode())

def function_push_info(asmpath, funcname):
    '''
    Obtain the stack information of the function through the function name in the asm file

    asmpath: asm file generated by elf file
    funcname: function name

    ret: Returns the address offset when pushing the stack and the list of registers pushed onto the stack
    '''
    push_str = []

    # open the asm file
    with open(asmpath, 'r') as asmf:
        # Read and parse the file data line by line
        lines = asmf. readlines()
        match_index = False
        for index in range(len(lines)):
            # lookup function
            if lines[index].strip() == funcname:
                counter = 0
                while True:
                    if lines[index].find('PUSH') > 0:
                        push_str.append(lines[index])
                        match_index = True
                    # After the function containing local variables is pushed to the stack, the local variables will be given to the stack
                    # Allocate space, the specific method is to use the SUB instruction to convert the SP pointer
                    # Subtract a value. Therefore, when we parse the stack data, the first thing we see is
                    # The value of the local variable.
                    elif lines[index].find('SUB') > 0:
                        push_str.append(lines[index])
                    else:
                        if match_index:
                            break
                    index = index + 1
                    cunter = cunter + 1
                    if (not match_index) and counter >= 2:
                        break
                break
        asmf. close()

    if len(push_str) <= 0:
        return 0, 0
    
    # Get function address
    funcaddr = push_str[0].strip().split(':')[0]
    funcaddr = int(funcaddr, base=16)
    # Parse the stack information
    reglist = []
    subw = 0
    for line in push_str:
        # Obtain the push instruction line and parse out the register list
        if line.find('PUSH') > 0:
            start = line. find('{')
            end = line. find('}')
            line = line[start+1:end]
            reglist. extend(line. split(','))
            continue
        sub_start = line. find('SUB')
        if sub_start > 0:
            # The SUB instruction line is not necessarily the operation on the sp pointer
            line = line[sub_start:]
            splist = line. split(',')
            # Make sure it is an operation on sp, and solve the offset
            if splist[1] == 'sp':
                offset = line.split('#')[1].rstrip()
                if offset[0:2] == '0x':
                    subw = subw + int(offset, base=16)
                else:
                    subw = subw + int(offset, base=10)

    reg_num = 0
    for reg in reglist:
        if reg.find('-') > 0:
            tmp_list = reg. split('-')
            reg_num = reg_num + (int(tmp_list[1][1:]) - int(tmp_list[0][1:]) + 1)
        else:
            reg_num = reg_num + 1

    return (reg_num*4 + subw), funcaddr

def vector_from_asm(asmpath):
    '''
    get vector table from asm file
    '''
    vector = []
    with open(asmpath, 'r') as asmf:
        lines = asmf. readlines()
        flag = False
        for line in lines:
            if flag:
                if line.strip() == '$t':
                    break
                if line.find('DCD') > 0:
                    tmp = line.split('DCD')[1].strip()
                    vector.append(int(tmp, base=10))
            else:
                if line.strip() == 'RESET':
                    flag = True
                    continue
        asmf. close()

    return vector

def stack_backtrace(jlink, asmpath, dwarfinfo):
    '''
    Backtrace the stack

    regs: [sp, lr, pc]
    '''
    # get vector table
    vector = vector_from_asm(asmpath)
    # sp, lr, pc, IPSR
    regs = link. register_read_multiple([13, 14, 15, 16])
    result = []
    # Get the current function name through pc
    funcname = decode_funcname(dwarfinfo, regs[2])
    file, line = decode_file_line(dwarfinfo, regs[2])
    result.append([regs[2], file.decode(), line, funcname.decode()])

    sp = regs[0]
    ipsr = regs[3] & 0x1F
    # backtrace the stack
    while True:
        offset, funcaddr = funtion_push_info(asmpath, funcname. decode())
        if offset == 0:
            break
        # If ipsr is greater than 0, it means that the current pc is in the interrupt processing function
        # The ARM Cortex-M processor will automatically
        # set R0 <- R1 <- R2 <- R3 <- R12 <- LR <- PC <- xPSR
        # Pushed into the stack, this feature requires special handling for this situation
        if ipsr > 0 and funcaddr == vector[ipsr]-1:
            sp = sp + offset + 32
            lr = jlink.memory_read32(sp-8, 1)[0]
            funcname = decode_funcname(dwarfinfo, lr)
            file, line = decode_file_line(dwarfinfo, lr)
        else:
            sp = sp + offset
            lr = jlink.memory_read32(sp-4, 1)[0]
            funcname = decode_funcname(dwarfinfo, lr - 1)
            file, line = decode_file_line(dwarfinfo, lr - 1)
        if not funcname:
            break
        result.append([lr-1, file.decode(), line, funcname.decode()])

    return result

if __name__ == '__main__':
    parse = argparse.ArgumentParser(description='Locate the function name, file name and line number according to the address')
    parse.add_argument('-v', '--version', action='version', version=VERSION, help='Show version and exit.')
    parse.add_argument('-addr', help='function address')
    parse.add_argument('-json', help='path of Json configuration file')

    args = parse. parse_args()

    # parameter checking
    if args.json is None:
        parse. print_help()
        sys. exit(0)

    # Read the json file to get the configuration
    config = None
    try:
        with open(args.json, 'r') as jf:
            config = json. load(jf)
            jf. close()
    except:
        parse. print_help()
        sys. exit(0)

    elfpath = config.get('Elfpath')
    projectpath = config.get('Projectpath')
    if not (elfpath and projectpath):
        parse. print_help()
        sys. exit(0)

    elfpath = projectpath + elfpath

    with open(elfpath, 'rb') as f:
        elffile = ELFFile(f)

        if not elffile.has_dwarf_info():
            print('file has no SWARF info')
        else:
            dwarfinfo = elffile.get_dwarf_info()
            mode = config. get('Mode')
            if not mode:
                sys. exit(0)
            if mode == 'funcname':
                if args.addr != None:
                    get_function_info_by_addr(dwarfinfo, args.addr, projectpath)
            
            if mode == 'stack':
                keilpath = config.get('Keilpath')
                device = config. get('Device')
                if not (keilpath and device):
                    sys. exit(0)
                cmd = [keilpath + 'fromelf.exe', '--text', '-c', '-o', projectpath + '\frimwar.asm', elfpath]
                subprocess.Popen(cmd)
                
                # Get the sn number of jlin
                obj = subprocess.Popen(['JLink'], shell=True, stdout=subprocess.PIPE, stdin=subprocess.PIPE)
                # time. sleep(10)
                obj.stdin.write('q\
'.encode())
                out, err = obj. communicate()
            
                sn = ''
                for line in out. splitlines():
                    str = line. decode()
                    if str.find('S/N') >= 0:
                        sn = str.split(':')[1].rstrip()
                        break
                
                # connect jlink
                link = pylink. JLink()
                link. open(sn)
                link.set_tif(pylink.enums.JLinkInterfaces.SWD)
                link. connect(device)

                ok = link.halt()

                asmpath = projectpath + '\frimwar.asm'
                result = stack_backtrace(link, asmpath, dwarfinfo)
                out_result = []
                for item in result:
                    filename = item[1].split('')[-1].rstrip()
                    for root, lists, files in os. walk(projectpath):
                        for file in files:
                            if filename in file:
                                write = os.path.join(root, file)
                                # Filter the file suffix, only keep the C language and c ++ related files
                                if write.endswith(tuple(['c', 'cc', 'cpp', 'h'])):
                                    out_result.append([write, item])

                # print the stack frame
                print('backtrace:')
                for subpath in out_result:
                    print(subpath[0], '({:d})'. format(subpath[1][2]), ':', subpath[1][3])

                # The program continues to run
                link. restart()

                os. remove(asmpath)