Function call: Why does stack overflow occur?

Table of Contents

Why do we need a program stack?

Introduce function stack

How to construct a stack overflow?

How to use function inlining for performance optimization?

summary


We often encounter errors in the process of developing software. If you have searched for error information on Google, you should have visited the Stack Overflow website at some point. As the world’s largest programmer question and answer website, Stack Overflow’s name comes from a common error report, which is stack overflow.

Today, we will start with the function calls of the program, and talk about how the mutual calls between functions are implemented at the computer instruction level, and under what circumstances the stack overflow error will occur.

Why do we need the program stack?

As before, we start with a very simple C program function_example.c.

// function_example.c
#include <stdio.h>
int static add(int a, int b)
{
    return a + b;
}
 
 
int main()
{
    int x = 5;
    int y = 10;
    int u = add(x, y);
}

This program defines a simple function add, which accepts two parameters a and b, and returns a + b. In the main function, two variables x and y are defined, and then the add function is called to calculate u=x + y, and finally the value of u is printed out.

$ gcc -g -c function_example.c
$ objdump -d -M intel -S function_example.o

After we compile this program, objdump comes out. Let’s take a look at the corresponding assembly code.

int static add(int a, int b)
{
   0: 55 push rbp
   1: 48 89 e5 mov rbp,rsp
   4: 89 7d fc mov DWORD PTR [rbp-0x4],edi
   7: 89 75 f8 mov DWORD PTR [rbp-0x8],esi
    return a + b;
   a: 8b 55 fc mov edx,DWORD PTR [rbp-0x4]
   d: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8]
  10: 01 d0 add eax,edx
}
  12: 5d pop rbp
  13: c3 ret
0000000000000014 <main>:
int main()
{
  14:55 push rbp
  15: 48 89 e5 mov rbp,rsp
  18: 48 83 ec 10 sub rsp,0x10
    int x = 5;
  1c: c7 45 fc 05 00 00 00 mov DWORD PTR [rbp-0x4],0x5
    int y = 10;
  23: c7 45 f8 0a 00 00 00 mov DWORD PTR [rbp-0x8],0xa
    int u = add(x, y);
  2a: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8]
  2d: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
  30: 89 d6 mov esi,edx
  32: 89 c7 mov edi,eax
  34: e8 c7 ff ff ff call 0 <add>
  39: 89 45 f4 mov DWORD PTR [rbp-0xc], eax
  3c: b8 00 00 00 00 mov eax,0x0
}
  41: c9 leave
  42: c3 ret

It can be seen that in this code, the main function is not much different from the program execution we talked about in the previous section. It mainly replaces the jump instruction with the call instruction of the function call. What follows the call instruction is still the program address after the jump.

You should have no problem understanding this. Let’s look at an interesting part below.

Let’s look at the add function. It can be seen that after the add function is compiled, the code first executes a push instruction and a mov instruction; at the end of the function execution, a pop and a ret instruction are executed. The execution of these four instructions is actually performing the push and pop operations that we will talk about next.

Have you noticed that function calls are a bit similar to the if…else and for/while loops we talked about in the previous section. Both of them execute a jump instruction of a memory address during the original sequential execution of instructions, causing the instructions to jump from the original sequential execution process and start execution from the new jump position.

However, there is a difference between these two jumps. The if…else and for/while jumps will not come back after the jump. The instructions will be executed sequentially from the new address after the jump, just like Xu Zhimo wrote in “Farewell Cambridge”: “I wave my sleeves and don’t take away a single cloud” and continue to live a new life. As for the jump to a function call, after the instructions of the corresponding function have been executed, you have to return to the place where the function was called and continue to execute the instructions after the call, just like what He Zhizhang wrote in “Returning to Hometown”: “Young Master. When the little boy leaves home and the eldest brother returns, his local pronunciation has not changed and his hair on his temples has faded.” No matter how far we go, we will eventually come back.

So is there any way we can implement function calls without jumping back to the original starting point? It seems intuitively that there is such a solution. You can insert the called function instruction directly into the place where the function is called, replace the corresponding call instruction, and then when the compiler compiles the code, it directly replaces the function call with the corresponding instruction.

However, if you think about it carefully, you will find that there are some problems with this method. If function A calls function B, and then function B calls function A, we have to insert the instruction of B into A, and then insert the instruction of A into B, which will produce endless substitutions. It’s like two mirrors placed face to face. In any mirror, you can see infinite mirrors.

Infinite Mirror Effect, if function A calls B, and B then calls A, then the code will expand infinitely, image source

It seems that the method of inserting the instructions of the called function directly at the calling site does not work. Then let’s change our thinking. Can we record the address of the instruction that we want to jump back to execute later? Just like the PC register mentioned earlier, we can set up a special “program call register” to store the address of the instruction to be jumped back for execution. Wait until the function call ends, take the address from this register, jump to the recorded address, and continue execution.

But in multi-layer function calls, simply recording one address is not enough. After we call function A, A can also call function B, and B can also call function C. There is no limit to the number of these layer-by-layer calls. Before all function calls return, the return address of each call must be recorded, but the number of registers in our CPU is not many. The Intel i7 CPU we generally use only has 16 64-bit registers, and it cannot be stored once the number of layers of calls is increased.

Introduce function stack

Eventually, computer scientists came up with a better solution than recording the jump back address alone. We open up a space in the memory and use the stack as a data structure called LIFO (Last In First Out). The stack is like a ping pong ball. Every time the program calls a function, we write the address returned by the call on a ping pong ball and stuff it into the ball bucket. This operation is actually what we often call Pushing. If the function is finished executing, we will take out the top ping pong ball from the ball bucket. Obviously, this is pop out of the stack.

Get the ping pong ball popped off the stack, find the address above, jump the program there, and return to the next instruction after the function call. If function A calls function B before execution is completed, then we need to stuff a ping pong ball into the bucket before taking it out. When we take the ping pong ball from the top of the ball bucket, we must take the most recent one, which is the address after the function call at the bottom layer is completed. The bottom of the table tennis bucket is the stack bottom, and the position of the top table tennis ball is the stack top.

In a real program, what is pushed onto the stack is not only the return address after the function call is completed. For example, when function A calls B, it needs to transmit some parameter data. These parameter data will also be pushed onto the stack when the registers are not enough. All the memory space occupied by the entire function A is the Stack Frame of function A. Frame also means “photo frame” in Chinese, so every time I come here, I have a feeling that the memory space required by the entire function A is like being framed by such a “photo frame” and placed on the stack. in.

In the actual program stack layout, the top and bottom are upside down compared to our ping pong bucket. The bottom is at the top and the top is at the bottom. This layout is because the memory address at the bottom of the stack is fixed at the beginning. After the stack is stacked layer by layer, the memory address at the top of the stack gradually becomes smaller instead of larger.

Corresponding to the assembly code of the function add above, let’s take a closer look. When the main function calls the add function, the add function entry is at lines 0 to 1. You can see line 0 => push rbp; after the add function ends, it is at lines 12 to 13. , line 12 => pop rbp .

When we call the call instruction on line 34, we will push the address of the next instruction in the current PC register onto the stack, retaining the address of the instruction to be executed after the function call is completed. In line 0 of the add function, the push rbp instruction is pushing the stack. The rbp here is also called the stack frame pointer (Frame Pointer), which is a register that stores the current stack frame position. push rbp pushes the bottom address of the stack frame of the previously called function, that is, the main function, to the top of the stack.

Then, a command in line 1, mov rbp, rsp, copies the value of the stack pointer (Stack Pointer) rsp to rbp, and rsp will always point to the top of the stack. This command means that the address pointed to by the stack frame pointer rbp becomes the current top of the stack, which is the bottom address of the stack frame of the add function.

After the function add is executed, pop rbp on line 12 will be called to pop the current top of the stack. This part of the operation maintains our entire stack frame. Then, we can call the ret instruction on line 13. At this time, we need to pop the next instruction in the PC register that was pushed when the call was called, update it to the PC register, and return the control of the program to after popping the stack. top of the stack.

How to construct a stack overflow?

By introducing the stack, we can see that no matter how many layers of function calls there are, or calling function B in function A, and then calling A in function B, for such recursive calls, we only need to maintain rbp and rsp. Two registers that maintain the address of the top of the stack can manage jumps between different functions. However, the size of the stack is also limited. If there are too many function call levels and we push more content into the stack than it can store, the program will encounter a stack overflow error during execution. This is the famous “stack overflow”.

It is not difficult to create a stack overflow error. The simplest way is to use the Infiinite Mirror Effect we mentioned above, where function A calls itself without any termination conditions. Such an infinitely recursive program fills up the entire stack space in the process of continuously pushing onto the stack, and eventually encounters a stack overflow.

int a()
{
  return a();
}
 
 
int main()
{
  a();
  return 0;
}

In addition to infinite recursion, the recursion level is too deep, and variables that occupy a lot of memory (such as a huge array) are created in the stack space. These situations are likely to bring you stack overflow. I believe you understand how the stack works during the running of the program. When you encounter the stackoverflow error in the future, you will not be completely directionless.

How to use function inlining for performance optimization?

Above we mentioned a method to directly insert the instruction generated by an actually called function into the position to replace the corresponding function call instruction. Although this general function calling scheme has been rejected by us, this method can still work if no other functions are called in the called function.

In fact, this is a common scenario where the compiler performs automatic optimization, which we usually call Function inlining (Inline). As long as we add the corresponding parameter -O that allows the compiler to automatically optimize when compiling with GCC, the compiler will perform such instruction replacement if feasible.

Let’s look at a piece of code.

#include <stdio.h>
#include <time.h>
#include <stdlib.h>
 
int static add(int a, int b)
{
    return a + b;
}
 
int main()
{
    srand(time(NULL));
    int x = rand() % 5
    int y = rand() % 10;
    int u = add(x, y)
    printf("u = %d\\
", u)
}

In order to prevent the compiler from optimizing too much code, I slightly modified function_example.c, so that the parameters x and y became, generated by random numbers, and added a statement to print u through printf at the end of the code. .

$ gcc -g -c -O function_example_inline.c
$ objdump -d -M intel -S function_example_inline.o

The compiled assembly code of function_example_inline.c above does not compile the add function into a sequence of instructions separately, but directly replaces it with an add instruction when calling u = add(x, y).

return a + b;
  4c: 01 de add esi,ebx

In addition to relying on the compiler’s automatic optimization, you can also add the inline keyword where the function is defined to prompt the compiler to inline the function.

The optimization brought by inlining is that the number of instructions that the CPU needs to execute is reduced, the process of jumping according to the address is no longer needed, and the processes of pushing and popping the stack are also no longer needed.

However, inlining is not without cost. Inlining means that we fully expand the reusable program instructions at the place where it is called. If a function is called in many places, it will be expanded many times, and the space occupied by the entire program will become larger.

In this way, the function that is only called without calling other functions is generally called a leaf function (or leaf procedure).

Summary

Register name:

rbp – register base pointer (start of stack)
rsp – register stack pointer (current location in stack, growing downwards)

We talked about how the inter-function calls of a program are executed at the CPU instruction level. One thing that you must keep in mind is the new concept of Program Stack.

We can easily push and pop the stack to make the program transfer during different function calls. As for function inlining and stack overflow, one is an optimization solution we can often choose, and the other is a program bug we often encounter.

By adding the program stack, we are equivalent to adding a “memory” function in the instruction jumping process. After jumping to run a new instruction, we can then return to the jumping position, which can achieve richer and more complex Flexible instruction execution process. This also provides us with an abstraction of “function” in the process of program development, so that we can reuse code and instructions in the process of software development, instead of simply copying and pasting codes and instructions.