In-depth exploration of C++ polymorphism ① – virtual function calling link

Foreword

I have benefited a lot from reading Mr. Hou Jie’s two books recently: (Translation) “In-depth Exploration of the C++ Object Model” and “C++ Virtual and Polymorphism”.

To understand the working principle of polymorphism, you must understand the relationship between these knowledge points: virtual function, virtual function table, virtual function pointer, and the memory layout of the object.

  • In-depth exploration of C++ polymorphism ① – Virtual function call link
  • A Deeper Exploration of C++ Polymorphism ② – Inheritance
  • A Deeper Exploration of C++ Polymorphism ③ – Virtual Destruction

1. Overview

1.1. Concept

This chapter mainly explores C++ dynamic polymorphism. Let’s first understand some of its related concepts:

  • Polymorphism is an important concept in C++ that allows functions in a base class to override in a derived class and handle the same data type in a different way ; The implementation of polymorphism relies on virtual functions and dynamic binding.

  • Virtual function is a special member function that allows functions in a base class to be overridden in a derived class. When a function is declared virtual, the compiler adds an entry to the class’s virtual function table that points to the address of the virtual function. If a class inherits a virtual function from another class, then it inherits that class’s vtable and adds its own virtual function in it.

  • Virtual function table is a table containing virtual function addresses. Each class has a virtual function table. Each entry in the virtual function table is a pointer to a virtual function. When a class contains a virtual function, the compiler adds an entry to the class’s virtual function table that points to the address of the virtual function. If a class inherits a virtual function from another class, then it inherits that class’s vtable and adds its own virtual function in it.

  • Virtual function pointer is a pointer to the virtual function table, which is stored in the memory of each object. When an object is created, its virtual function pointer is initialized to point to the class’s virtual function table. When a virtual function is called, the compiler uses the virtual function pointer to find the address of the function in the virtual function table and calls the function.

  • Dynamic binding is a mechanism that determines function calls at runtime. When a function is declared virtual, the compiler uses dynamic binding to determine the actual address of the function. When a virtual function is called, the compiler uses the virtual function pointer to find the address of the function in the virtual function table and calls the function.

Part of the text source: ChatGPT

1.2. Example

The concept is relatively abstract, so I’ll write a demo and use the pictures to make it easier to understand~

  • Source code.
/* g + + -std='c + + 11' test.cpp -o t & amp; & amp; ./t */
#include <iostream>
#include <memory>

class Model {<!-- -->
   public:
    virtual void face() {<!-- --> std::cout << "model's face!" << std::endl; }
};

class Gril : public Model {<!-- -->
   public:
    virtual void face() {<!-- --> std::cout << "girl's face!" << std::endl; }
};

class Man : public Model {<!-- -->
   public:
    virtual void face() {<!-- --> std::cout << "man's face!" << std::endl; }
};

class Boy : public Model {<!-- -->
   public:
    virtual void face() {<!-- --> std::cout << "boy's face!" << std::endl; }
};

void take_photo(const std::unique_ptr<Model> & amp; m) {<!-- --> m->face(); }

int main() {<!-- -->
    auto model = std::unique_ptr<Model>(new Model);
    auto girl = std::unique_ptr<Model>(new Gril);
    auto man = std::unique_ptr<Model>(new Man);
    auto boy = std::unique_ptr<Model>(new Boy);
    take_photo(model);
    take_photo(girl);
    take_photo(man);
    take_photo(boy);
    return 0;
}
  • operation result.
model's face!
girl's face!
man's face!
boy's face!

2. Working environment

2.1. System

# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
# cat /proc/version
Linux version 3.10.0-1127.19.1.el7.x86_64 ([email protected])
(gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) )

2.2. Tools

This article analyzes the principle of polymorphism and uses the following tools:

Tool Description
gdb gdb is the abbreviation of GNU debugger and is a tool for debugging programs.
-fdump-class-hierarchy -fdump-class-hierarchy is a compiler option of GCC , used to generate class hierarchy information during compilation. It outputs the inheritance relationships of classes to a file in text form so that developers can view and analyze the relationships between classes. – This option is useful when debugging and understanding class inheritance relationships in your code.
c++ filt c++ filt is a tool for parsing C++ symbols . It can deparse symbols generated by the C++ compiler so that they are easier to understand and read. It can convert symbols generated by the C++ compiler into readable function names, class names, and variable names.
objectdump objectdump is a tool for analyzing target files. It can display the contents of each section of the target file, including code, data, symbol table, etc. It can also disassemble the machine code of the object file to provide a deeper understanding of the execution process of the program.

Part of the text source: ChatGPT

3. Polymorphic important data structures

When I was debugging the internal source code of dynamic_cast, I found some data structures: __class_type_info, vtable_prefix, which help me better understand the working principle of polymorphism.

For debugging methods, please refer to: “(ubuntu) vscode + gdb debugging c++”

3.1. Type information

type_info is a data structure of type information of a class, used to obtain the type information of objects at runtime; the polymorphic working mechanism derives various types from type_info according to different application scenarios. Information structure class.

  • Basic type information structure.
/* /usr/include/c++ /4.8.2/typeinfo */
// The type_info class describes type information generated by an implementation.
class type_info {<!-- -->
 protected:
    const char* __name;
};

/* /usr/include/c + + /4.8.2/cxxabi.h */
// Type information for a class.
class __class_type_info : public std::type_info {<!-- -->
 public:
    ...
};
  • Single inheritance type information structure.
/* /usr/include/c + + /4.8.2/cxxabi.h */
// Type information for a class with a single non-virtual base.
class __si_class_type_info : public __class_type_info {<!-- -->
   public:
    const __class_type_info *__base_type;
    ...
};
  • Multiple inheritance or virtual inheritance type information structure.
/* /usr/include/c + + /4.8.2/cxxabi.h */
// Helper class for __vmi_class_type.
class __base_class_type_info {<!-- -->
   public:
    const __class_type_info *__base_type; // Base class type.
#ifdef _GLIBCXX_LLP64
    long long __offset_flags; // Offset and info.
#else
    long __offset_flags; // Offset and info.
#endif
    ...
};

// Type information for a class with multiple and/or virtual bases.
class __vmi_class_type_info : public __class_type_info {<!-- -->
   public:
    unsigned int __flags; // Details about the class hierarchy.
    unsigned int __base_count; // Number of direct bases.

    // The array of bases uses the trailing array struct hack so this
    // class is not constructable with a normal constructor. It is
    // internally generated by the compiler.
    __base_class_type_info __base_info[1]; // Array of bases.
    ...
};

3.2. Virtual table description structure

vtable_prefix: Virtual table description structure, used to represent the prefix of the virtual function table. An object may have multiple virtual pointers and multiple virtual tables describing structures; each virtual pointer of the object points to the corresponding vtable_prefix.origin.

  1. whole_object: I think it would be more appropriate to change it to: top_offset. The current virtual pointer position in the object memory, offset from the top, because the object may have multiple virtual tables, and the corresponding virtual pointer on the object memory layout can be found through the offset.
  2. whole_type: Type information of the class.
  3. Origin: The virtual pointer points to the location of the virtual table.
/* /usr/src/debug/gcc-4.8.5-20150702/libstdc + + -v3/libsupc + + /tinfo.h */
// Initial part of a vtable, this structure is used with offsetof, so we don't
// have to keep alignments consistent manually.
struct vtable_prefix {<!-- -->
    // Offset to most derived object.
    ptrdiff_t whole_object;

    // Additional padding if necessary.
#ifdef _GLIBCXX_VTABLE_PADDING
    ptrdiff_t padding1;
#endif

    // Pointer to most derived type_info.
    const __class_type_info *whole_type;

    // Additional padding if necessary.
#ifdef _GLIBCXX_VTABLE_PADDING
    ptrdiff_t padding2;
#endif

    // What a class's vptr points to.
    const void *origin;
};

We can refer to the memory layout of multiple inheritance polymorphic objects that will be discussed in the next chapter to understand the virtual table description structure.

4. Virtual function calling link

C++ polymorphism is a relatively complex feature. From easy to difficult, let’s first understand the virtual function calling workflow of polymorphic class objects with no inheritance relationship.

  • link.
this -> vptr -> vbtl -> virtual function
  • Test source code.
// g + + -g -O0 -std=c + + 11 -fdump-class-hierarchy test_virtual.cpp -o t
#include <iostream>

class A {<!-- -->
   public:
    int m_a = 0;
    virtual void vfuncA1() {<!-- -->}
    virtual void vfuncA2() {<!-- -->}
};

int main(int argc, char** argv) {<!-- -->
    A* a = new A;
    a->vfuncA2();
    return 0;
}
  • Assemble source code. Observe the calling process of virtual functions through assembly code:
int main(int argc, char** argv) {<!-- -->
  ;...
    A* a = new A;
  ;...
  40071d: e8 8e 00 00 00 callq 4007b0 <_ZN1AC1Ev>
  ; Push the object (this) pointer of a onto the stack to -0x18(%rbp).
  400722: 48 89 5d e8 mov %rbx,-0x18(%rbp)
    a->vfuncA2();
  ; Find the virtual pointer.
  400726: 48 8b 45 e8 mov -0x18(%rbp),%rax
  ; Through the virtual pointer, find the starting position where the virtual table saves the virtual function.
  40072a: 48 8b 00 mov (%rax),%rax
  ; Offset through the above starting position to find the address of a virtual function stored in the virtual table.
  40072d: 48 83 c0 08 add $0x8,%rax
  ; Find the corresponding virtual function.
  400731: 48 8b 00 mov (%rax),%rax
  ; Pass the a pointer as a parameter through the register and pass it to the virtual function for use
  400734: 48 8b 55 e8 mov -0x18(%rbp),%rdx
  400738: 48 89 d7 mov %rdx,%rdi
  ; Call virtual function
  40073b: ff d0 callq *%rax
    return 0;
  ;...
}
  1. Find the virtual pointer. The first memory of object a stores the virtual pointer address pointing to the virtual table.
  2. Through the virtual pointer, find the starting position where the virtual table saves the virtual function.
  3. By offsetting the starting position of the virtual function stored in the above virtual table, we can find the address where the virtual table stores the corresponding virtual function, and thus find the corresponding virtual function.
  4. Writes the a(this) pointer to the rdi register, passed as an argument to the virtual function call.
  5. The call command calls a virtual function (A::vfuncA2(this)).

5. Memory layout

May I ask which data partition of the memory are virtual function table and virtual function located in? We can use the objdump tool for analysis.

Use the above test demo for analysis: the virtual function table is in the Literal constant area of the memory, and the virtual function is in the Program code area of the memory.

Reference: Memory distribution of program variables (Linux), in-depth exploration of C++ polymorphism ② – Inheritance relationship, in-depth exploration of C++ polymorphism ③ – Virtual destructor

  • Data partition.
Area Description Variable type
stack Stack area Temporary variable
heap Heap area malloc allocates space Variables
.data, .bss Global data area Global variables/static variables
.rodata Text constant area Read-only data, constants, etc.
.text Program code area Program code
  • Use the objdump tool.
# Compile test code.
g + + -std=c + + 11 test.cpp -o test

# Use objdump to export executable file information.
objdump -CdStT test > asm.log

# Get program code area information.
cat asm.log| grep '\.text'

0000000000400610 l d .text 0000000000000000 .text
0000000000400610 g F .text 0000000000000000 _start
00000000004007b0 w F .text 0000000000000020 A::A()
# Virtual function.
000000000040079c w F .text 000000000000000a A::vfuncA1()
00000000004007a6 w F .text 000000000000000a A::vfuncA2()
00000000004007d0 g F .text 0000000000000065 __libc_csu_init
00000000004007b0 w F .text 0000000000000020 A::A()
00000000004006fd g F .text 000000000000004c main

# Get program text constant area information.
cat asm.log| grep '\.rodata'
0000000000400860 l d .rodata 0000000000000000 .rodata
# Virtual table.
0000000000400880 w O .rodata 0000000000000020 vtable for A
00000000004008a0 w O .rodata 0000000000000003 typeinfo name for A
00000000004008b0 w O .rodata 0000000000000010 typeinfo for A

6. Reference

  • “In-Depth Exploration of the C++ Object Model”
  • “C++ Virtual and Polymorphism”
  • Polymorphism and its basic principles
  • Analysis of the implementation principles of C++ polymorphism
  • Discuss memory layout again
  • C++: Let’s talk about RTTI from a technical implementation perspective
  • c++ object memory layout
  • Memory layout of C++ objects (Part 1)
  • Memory layout of C++ objects (Part 2)
  • How to write assembly language in vscode and debug in terminal (nanny level)
  • C++Virtual Table Tables(VTT)
  • godbolt.org
  • What is the VTT for a class?