Foreword
I have benefited a lot from reading Mr. Hou Jie’s two books recently: (Translation) “In-depth Exploration of the C++ Object Model” and “C++ Virtual and Polymorphism”.
To understand the working principle of polymorphism, you must understand the relationship between these knowledge points: virtual function
, virtual function table
, virtual function pointer
, and the memory layout
of the object.
- In-depth exploration of C++ polymorphism ① – Virtual function call link
- A Deeper Exploration of C++ Polymorphism ② – Inheritance
- A Deeper Exploration of C++ Polymorphism ③ – Virtual Destruction
1. Overview
1.1. Concept
This chapter mainly explores C++ dynamic polymorphism. Let’s first understand some of its related concepts:
-
Polymorphism
is an important concept in C++ that allows functions in a base class tooverride
in a derived class and handle the same data type in a different way ; The implementation of polymorphism relies onvirtual functions
anddynamic binding
. -
Virtual function
is a special member function that allows functions in a base class to be overridden in a derived class. When a function is declared virtual, the compiler adds an entry to the class’s virtual function table that points to the address of the virtual function. If a class inherits a virtual function from another class, then it inherits that class’s vtable and adds its own virtual function in it. -
Virtual function table
is a table containing virtual function addresses. Each class has a virtual function table. Each entry in the virtual function table is a pointer to a virtual function. When a class contains a virtual function, the compiler adds an entry to the class’s virtual function table that points to the address of the virtual function. If a class inherits a virtual function from another class, then it inherits that class’s vtable and adds its own virtual function in it. -
Virtual function pointer
is a pointer to the virtual function table, which is stored in the memory of each object. When an object is created, its virtual function pointer is initialized to point to the class’s virtual function table. When a virtual function is called, the compiler uses the virtual function pointer to find the address of the function in the virtual function table and calls the function. -
Dynamic binding
is a mechanism that determines function calls at runtime. When a function is declared virtual, the compiler uses dynamic binding to determine the actual address of the function. When a virtual function is called, the compiler uses the virtual function pointer to find the address of the function in the virtual function table and calls the function.
Part of the text source: ChatGPT
1.2. Example
The concept is relatively abstract, so I’ll write a demo and use the pictures to make it easier to understand~
- Source code.
/* g + + -std='c + + 11' test.cpp -o t & amp; & amp; ./t */ #include <iostream> #include <memory> class Model {<!-- --> public: virtual void face() {<!-- --> std::cout << "model's face!" << std::endl; } }; class Gril : public Model {<!-- --> public: virtual void face() {<!-- --> std::cout << "girl's face!" << std::endl; } }; class Man : public Model {<!-- --> public: virtual void face() {<!-- --> std::cout << "man's face!" << std::endl; } }; class Boy : public Model {<!-- --> public: virtual void face() {<!-- --> std::cout << "boy's face!" << std::endl; } }; void take_photo(const std::unique_ptr<Model> & amp; m) {<!-- --> m->face(); } int main() {<!-- --> auto model = std::unique_ptr<Model>(new Model); auto girl = std::unique_ptr<Model>(new Gril); auto man = std::unique_ptr<Model>(new Man); auto boy = std::unique_ptr<Model>(new Boy); take_photo(model); take_photo(girl); take_photo(man); take_photo(boy); return 0; }
- operation result.
model's face! girl's face! man's face! boy's face!
2. Working environment
2.1. System
# cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core) # cat /proc/version Linux version 3.10.0-1127.19.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) )
2.2. Tools
This article analyzes the principle of polymorphism and uses the following tools:
Tool | Description |
---|---|
gdb | gdb is the abbreviation of GNU debugger and is a tool for debugging programs. |
-fdump-class-hierarchy | -fdump-class-hierarchy is a compiler option of GCC , used to generate class hierarchy information during compilation. It outputs the inheritance relationships of classes to a file in text form so that developers can view and analyze the relationships between classes. – This option is useful when debugging and understanding class inheritance relationships in your code. |
c++ filt | c++ filt is a tool for parsing C++ symbols . It can deparse symbols generated by the C++ compiler so that they are easier to understand and read. It can convert symbols generated by the C++ compiler into readable function names, class names, and variable names. |
objectdump | objectdump is a tool for analyzing target files. It can display the contents of each section of the target file, including code, data, symbol table, etc. It can also disassemble the machine code of the object file to provide a deeper understanding of the execution process of the program. |
Part of the text source: ChatGPT
3. Polymorphic important data structures
When I was debugging the internal source code of dynamic_cast, I found some data structures: __class_type_info
, vtable_prefix
, which help me better understand the working principle of polymorphism.
For debugging methods, please refer to: “(ubuntu) vscode + gdb debugging c++”
3.1. Type information
type_info
is a data structure of type information of a class, used to obtain the type information of objects at runtime; the polymorphic working mechanism derives various types from type_info
according to different application scenarios. Information structure class.
- Basic type information structure.
/* /usr/include/c++ /4.8.2/typeinfo */ // The type_info class describes type information generated by an implementation. class type_info {<!-- --> protected: const char* __name; }; /* /usr/include/c + + /4.8.2/cxxabi.h */ // Type information for a class. class __class_type_info : public std::type_info {<!-- --> public: ... };
- Single inheritance type information structure.
/* /usr/include/c + + /4.8.2/cxxabi.h */ // Type information for a class with a single non-virtual base. class __si_class_type_info : public __class_type_info {<!-- --> public: const __class_type_info *__base_type; ... };
- Multiple inheritance or virtual inheritance type information structure.
/* /usr/include/c + + /4.8.2/cxxabi.h */ // Helper class for __vmi_class_type. class __base_class_type_info {<!-- --> public: const __class_type_info *__base_type; // Base class type. #ifdef _GLIBCXX_LLP64 long long __offset_flags; // Offset and info. #else long __offset_flags; // Offset and info. #endif ... }; // Type information for a class with multiple and/or virtual bases. class __vmi_class_type_info : public __class_type_info {<!-- --> public: unsigned int __flags; // Details about the class hierarchy. unsigned int __base_count; // Number of direct bases. // The array of bases uses the trailing array struct hack so this // class is not constructable with a normal constructor. It is // internally generated by the compiler. __base_class_type_info __base_info[1]; // Array of bases. ... };
3.2. Virtual table description structure
vtable_prefix
: Virtual table description structure, used to represent the prefix of the virtual function table. An object may have multiple virtual pointers and multiple virtual tables describing structures; each virtual pointer of the object points to the corresponding vtable_prefix.origin
.
- whole_object: I think it would be more appropriate to change it to:
top_offset
. The current virtual pointer position in the object memory, offset from the top, because the object may have multiple virtual tables, and the corresponding virtual pointer on the object memory layout can be found through the offset. - whole_type: Type information of the class.
- Origin: The virtual pointer points to the location of the virtual table.
/* /usr/src/debug/gcc-4.8.5-20150702/libstdc + + -v3/libsupc + + /tinfo.h */ // Initial part of a vtable, this structure is used with offsetof, so we don't // have to keep alignments consistent manually. struct vtable_prefix {<!-- --> // Offset to most derived object. ptrdiff_t whole_object; // Additional padding if necessary. #ifdef _GLIBCXX_VTABLE_PADDING ptrdiff_t padding1; #endif // Pointer to most derived type_info. const __class_type_info *whole_type; // Additional padding if necessary. #ifdef _GLIBCXX_VTABLE_PADDING ptrdiff_t padding2; #endif // What a class's vptr points to. const void *origin; };
We can refer to the memory layout of multiple inheritance polymorphic objects that will be discussed in the next chapter to understand the virtual table description structure.
4. Virtual function calling link
C++ polymorphism is a relatively complex feature. From easy to difficult, let’s first understand the virtual function calling workflow of polymorphic class objects with no inheritance relationship
.
- link.
this -> vptr -> vbtl -> virtual function
- Test source code.
// g + + -g -O0 -std=c + + 11 -fdump-class-hierarchy test_virtual.cpp -o t #include <iostream> class A {<!-- --> public: int m_a = 0; virtual void vfuncA1() {<!-- -->} virtual void vfuncA2() {<!-- -->} }; int main(int argc, char** argv) {<!-- --> A* a = new A; a->vfuncA2(); return 0; }
- Assemble source code. Observe the calling process of virtual functions through assembly code:
int main(int argc, char** argv) {<!-- --> ;... A* a = new A; ;... 40071d: e8 8e 00 00 00 callq 4007b0 <_ZN1AC1Ev> ; Push the object (this) pointer of a onto the stack to -0x18(%rbp). 400722: 48 89 5d e8 mov %rbx,-0x18(%rbp) a->vfuncA2(); ; Find the virtual pointer. 400726: 48 8b 45 e8 mov -0x18(%rbp),%rax ; Through the virtual pointer, find the starting position where the virtual table saves the virtual function. 40072a: 48 8b 00 mov (%rax),%rax ; Offset through the above starting position to find the address of a virtual function stored in the virtual table. 40072d: 48 83 c0 08 add $0x8,%rax ; Find the corresponding virtual function. 400731: 48 8b 00 mov (%rax),%rax ; Pass the a pointer as a parameter through the register and pass it to the virtual function for use 400734: 48 8b 55 e8 mov -0x18(%rbp),%rdx 400738: 48 89 d7 mov %rdx,%rdi ; Call virtual function 40073b: ff d0 callq *%rax return 0; ;... }
- Find the virtual pointer. The first memory of object a stores the
virtual pointer
address pointing to the virtual table. - Through the virtual pointer, find the starting position where the virtual table saves the virtual function.
- By offsetting the starting position of the virtual function stored in the above virtual table, we can find the address where the virtual table stores the corresponding virtual function, and thus find the corresponding virtual function.
- Writes the a(this) pointer to the rdi register, passed as an argument to the virtual function call.
- The call command calls a virtual function (A::vfuncA2(this)).
5. Memory layout
May I ask which data partition of the memory are virtual function table
and virtual function
located in? We can use the objdump
tool for analysis.
Use the above test demo for analysis: the virtual function table is in the Literal constant area of the memory, and the virtual function is in the Program code area of the memory.
Reference: Memory distribution of program variables (Linux), in-depth exploration of C++ polymorphism ② – Inheritance relationship, in-depth exploration of C++ polymorphism ③ – Virtual destructor
- Data partition.
Area | Description | Variable type |
---|---|---|
stack | Stack area | Temporary variable |
heap | Heap area | malloc allocates space Variables |
.data, .bss | Global data area | Global variables/static variables |
.rodata | Text constant area | Read-only data, constants, etc. |
.text | Program code area | Program code |
- Use the objdump tool.
# Compile test code. g + + -std=c + + 11 test.cpp -o test # Use objdump to export executable file information. objdump -CdStT test > asm.log # Get program code area information. cat asm.log| grep '\.text' 0000000000400610 l d .text 0000000000000000 .text 0000000000400610 g F .text 0000000000000000 _start 00000000004007b0 w F .text 0000000000000020 A::A() # Virtual function. 000000000040079c w F .text 000000000000000a A::vfuncA1() 00000000004007a6 w F .text 000000000000000a A::vfuncA2() 00000000004007d0 g F .text 0000000000000065 __libc_csu_init 00000000004007b0 w F .text 0000000000000020 A::A() 00000000004006fd g F .text 000000000000004c main # Get program text constant area information. cat asm.log| grep '\.rodata' 0000000000400860 l d .rodata 0000000000000000 .rodata # Virtual table. 0000000000400880 w O .rodata 0000000000000020 vtable for A 00000000004008a0 w O .rodata 0000000000000003 typeinfo name for A 00000000004008b0 w O .rodata 0000000000000010 typeinfo for A
6. Reference
- “In-Depth Exploration of the C++ Object Model”
- “C++ Virtual and Polymorphism”
- Polymorphism and its basic principles
- Analysis of the implementation principles of C++ polymorphism
- Discuss memory layout again
- C++: Let’s talk about RTTI from a technical implementation perspective
- c++ object memory layout
- Memory layout of C++ objects (Part 1)
- Memory layout of C++ objects (Part 2)
- How to write assembly language in vscode and debug in terminal (nanny level)
- C++Virtual Table Tables(VTT)
- godbolt.org
- What is the VTT for a class?