Revealing the secrets of Java class loading mechanism and parent delegation: knowing why, dancing in the world of code

Article directory

- load
- verify
- load
- Continue to verify
- - Metadata validation
  - Bytecode verification
  - Symbol reference verification
- Preparation Phase
- Analysis
- - Symbolic references and direct references
  - Static linking and dynamic linking
- Initialization
- Use and uninstall

I am Liao Zhiwei, a Java development engineer, high-quality creator in the Java field, CSDN blog expert, 51CTO expert blogger, Alibaba Cloud expert blogger, Tsinghua University Press contracted author, product soft article creator, technical article review teacher, Questionnaire designer, personal community founder, open source project contributor. I have run fifteen kilometers, climbed Hengshan Mountain on foot, and have experienced losing 20 pounds in three months. I am a ruthless person who likes to lie flat.

Have many years of front-line R&D and team management experience, and have studied the underlying source code of mainstream frameworks (Spring, SpringBoot, Spring MVC, SpringCould, Mybatis, Dubbo, Zookeeper), the underlying architecture principles of message middleware (RabbitMQ, RockerMQ, Kafka), Redis Cache, MySQL relational database, ElasticSearch full-text search, MongoDB non-relational database, Apache ShardingSphere sub-database and table read-write separation, design pattern, domain-driven DDD, Kubernetes container orchestration, etc. Have experience in high-concurrency projects from 0 to 1, using elastic scaling, load balancing, alarm tasks, and self-starting scripts. The highest pressure has been tested on 200 machines. He has rich experience in project tuning.

After years of experience in creating thousands of articles in CSDN, I already have good writing skills. At the same time, I also signed a contract with Tsinghua University Press for four books, which will be published next year. These books include “Java Project Practice – In-depth understanding of common technologies of large Internet companies” for basic, advanced and architecture chapters, and “Decrypting the Programmer’s Thinking Code – Practice of Communication, Speech and Thinking” 》. The specific publishing plan will be adjusted according to the actual situation. I hope all readers can support me!

I hope that all readers will support bloggers who write articles with care. Times have changed now, information is exploding, and the alley is dark. Bloggers really need everyone’s help to continue to shine in this ocean, so hurry up. Move your little hands, click to follow, click to like, click to favorite, and even click to comment, these are the best support and encouragement for bloggers!

Blog homepage: I am Liao Zhiwei
Open source project: java_wxid
Bilibili: I am Liao Zhiwei
Personal Community: The boss behind the scenes
Personal WeChat ID: SeniorRD

At this beautiful moment, I no longer talk nonsense, and now enter the topic to be discussed in the article without delay. Next, I will present the main text content to you.

$Revealing the Java class loading mechanism and parent delegation: knowing why, dancing in the world of code\$

Loading

After a Java source file is compiled, it becomes a class bytecode file and is stored on the disk. At this time, the jvm needs to read the bytecode file and read the bytecode file through the IO stream. This step is loading.

When we write programs in Java, we need to use a variety of classes to implement functions. These class files cannot be directly recognized by the computer and need to be loaded into the JVM through a class loader before they can be used.

The class loader uses a method called “parental delegation mechanism” to load class files. This mechanism first checks whether the current class is loaded using a custom loader. If not, it delegates loading to the application class loader. If this class file has already been loaded, it will not be loaded again. If not, the parent loader will be obtained, and then the loadClass method of the parent loader will be called. In the same way, the extension class loader of the parent class will first check whether it has been loaded. If not, go up and check the startup class loader. When the class loader is started, there is no parent loader. At this time, it starts to consider whether it can be loaded. If it cannot be loaded, it will sink to the child loader to load, all the way to the bottom. If no loader can load it, a ClassNotFoundException will be thrown, telling us that the class file does not exist.

The advantage of this mechanism is that it can avoid conflicts between class files with the same file name in the same path. For example, if you write a java.lang.object class yourself, this class has the same path and file name as the object class in the JDK. If you don’t use the parent delegation mechanism, you won’t know which class to use. Using the parent delegation mechanism, it delegates to the parent class loader to find out whether the file has been loaded, thus avoiding the above situation.

For example, when an application needs to use the java.lang.Object class, the application class loader will first try to load the class. Since the java.lang.Object class has already been loaded by the JDK startup class loader, the application class loader will not load the class again, but directly uses the version loaded by the startup class loader. If your application needs to use a custom Object class, you can load the class through a custom loader to avoid conflict with the Object class in the JDK.

Verification

The JVM does not run the file directly after reading it. It also needs to verify whether the loaded bytecode file complies with the JVM specification.

When performing software verification, you first need to perform file format verification. This step is very important because during the verification process we need to confirm that the file meets the requirements and can be parsed and executed. When verifying the class file, you need to verify the magic number and major and minor version numbers contained in it.

The magic number is a special identifier in a class file, which acts like a file signature. Only magic numbers that meet the requirements can be correctly recognized and loaded by the JVM (Java Virtual Machine). Similar to when we usually open a file, we need to check whether the file extension is in a format we are familiar with.

The major and minor version numbers refer to the version information of the class file. If an older JVM version is used to parse a new version of the class file, compatibility issues will arise. Therefore, when verifying a class file, you need to check whether its major and minor version numbers meet the JVM version requirements to ensure that the file can be parsed and executed correctly.

Simply put, if the magic number and major and minor version numbers of a class file meet the requirements, then it can be correctly recognized and loaded by the JVM, and the verification passes. Otherwise, it cannot be loaded and executed.

For example, we can compare a class file to a schoolbag. Before school starts every day, we will check whether the items in the schoolbag are complete, including pens, paper, books, water bottles, etc. If we find that a necessary item is missing from our schoolbag, such as a pen, then we will not be able to write normally during class, which will affect our learning. Similarly, if a class file lacks the necessary magic number and major and minor version numbers, it will not be correctly recognized and executed by the JVM, and will not work properly.

Loading

It will convert the binary static file of the class file into the method area. When it is converted into the method area, there will be a structural adjustment to convert the static storage file into the runtime data area. This conversion is equivalent to returning to loading.

In a Java program, when the runtime data area of the method area is executed, a Class object of the current class will be generated in the Java heap memory, which serves as the entry point for various accesses to the class in the method area. For example, the Object class is the parent class of all classes and needs to be accessed by various classes, so it also needs an access entrance. When loading the Object class, it will go through a series of operations and store its java.lang.Object in the Java heap memory so that other classes can access it.

For example, if we have a class named Student, this class defines some basic information about the student, such as name and age. When we call this class in the program, the Java virtual machine creates a Class object of the Student class in the heap memory as the access entrance to the class, making it convenient for other classes to access and call the class when the program is running.

Continue verification

Metadata validation

It performs semantic analysis on the information described by bytecode to check whether a program conforms to programming specifications and logic.

For example, if a class declares a method, but the implementation does not write the code according to the number and type of parameters declared by the method, metadata validation will point out this error. For another example, if a class inherits a class modified by final, metadata verification will determine this violation.

Metadata verification also checks whether a class has the necessary inheritance relationships, for example, whether a class inherits from a legal parent class, whether it implements the necessary interfaces, etc. In addition, metadata validation also verifies whether a class overrides the methods of the parent class and whether these methods are overridden as specified.

Bytecode verification

Through data flow and control flow analysis, it is determined that the program semantics are legal and logical, for example: whether the data type of the operand stack and the instruction code sequence can work together, whether the type conversion in the method is valid, etc.

In order to better understand the process of bytecode verification, we can divide it into two parts: data flow analysis and control flow analysis. Data flow analysis focuses on the use of variables, constants, array elements, and objects. The use of variables needs to meet the regulations of their data types. For example, if a variable is declared as an integer, it cannot be assigned to a string. The use of constants and array elements also needs to follow their type specifications. The use of objects requires attention to the object creation process and access rights. Control flow analysis focuses on the execution process of the code. It checks whether the branch structures, loop structures, and exception handling in the code are logical to avoid errors such as infinite loops, null pointer exceptions, and array out-of-bounds errors in the program. In addition to data flow and control flow analysis, bytecode verification will also check some special situations, such as whether the type conversion in the method is valid, whether there is illegal access, etc. These checks are to ensure the safety and correctness of Java programs. If the Java program has passed bytecode verification, it can be run on the Java virtual machine.

For example, if we declare an integer variable in the program and assign it a value of 100, but try to assign it to a string in the code, then the bytecode verification will find such an error and Mark it as illegal code before running it. Another example is that in a method, if we try to convert a string type variable to an integer type, but in fact this variable is not a legal integer type string, it will also be discovered by bytecode verification and marked as Illegal code.

Symbol reference verification

To ensure that the parsing action can be executed correctly, in the program, we can find the corresponding classes and methods through symbolic references, and confirm whether the access permissions of classes, attributes, and methods can be accessed by the current class. This process is like finding the coordinates of a location on a map, determining whether the location is reachable, whether there is a feasible path, etc.

For example, if this symbolic reference is a class, we need to confirm whether this class exists, whether it can be accessed by our program, and whether it has a parent class from which it is inherited. If the symbolic reference is a method, we need to confirm that the method exists and can be accessed, that its parameter types match the actual parameters, etc.

After completing the symbol reference verification, the next preparation that needs to be done is to resolve the symbol reference into a direct reference. This process is like finding a destination based on coordinates on a map. You need to determine how to go and whether you need to pass through forests, rivers and other obstacles. For example, we need to resolve a symbol reference into a direct reference, need to find the specific location of this class or method in the program, check whether it has been loaded into memory, etc.

Preparatory phase

In the process of writing code, we often need to define some static variables to save some public data or status. These static variables belong to classes, not objects, so when the class is loaded into memory, memory space needs to be allocated for these variables and default values assigned to them. This process is called allocating memory space, also called the preparation phase.

Suppose we have a class called Student, which defines a static variable int age and a static variable String name. During the preparation phase, the JVM allocates memory space for these two variables and assigns them default values. For variables of type int, the default value is 0; for variables of reference type, the default value is null. Therefore, after the preparation phase, the value of age is 0 and the value of name is null.

In addition, if a static variable is finalized as a constant, then there is no need to assign a default value to it during the preparation phase, because the constant must be initialized when it is defined, and once initialized, it cannot be modified. For example, we can define a constant PI with a value of 3.14 and the code is as follows:

public static final double PI = 3.14;

During the preparation phase, the JVM will allocate memory space for PI and set its value to 3.14, so there is no need to initialize it with the default value.

Finally, let’s take another example. Suppose we have a class called Config, which defines a static variable Map config. During the preparation phase, the JVM allocates memory space for the config variable and sets its value to null. If we want to assign a value to the config variable while the program is running, we can initialize it in a static code block. The code is as follows:

public class Config {<!-- -->
    public static Map<String, String> config;

    static {<!-- -->
        config = new HashMap<>();
        config.put("name", "Zhang San");
        config.put("age", "18");
    }
}

In the static code block, we allocate memory space for the config variable, create a HashMap instance and assign it to the config variable. Therefore, we can use the config variable to save some configuration information when the program is running.

During the preparation phase, the JVM allocates memory space for the static variables of the class and sets default values. If it is a constant, there is no need to set a default value, but assign it directly. We can use static code blocks to initialize static variables and use these variables to save some public data or status while the program is running.

Analysis

In a compiler, parsing refers to the process of converting symbol references in a program into direct references. Through parsing, the program can accurately find the methods and data that need to be used, and load them into memory to facilitate the correct execution of the program.

Symbol reference and direct reference

Symbolic references refer to identifiers such as variables and functions used in the program. They are not pointers to specific memory addresses. When the compiler compiles the program, it cannot determine the specific memory address corresponding to the symbol reference, because these methods and data may be loaded into different locations in the memory. Therefore, the compiler records the symbol reference in the symbol table so that it can be resolved to a direct reference during linking.

Direct reference refers to the address corresponding to the method or data directly used in the program in the memory. During program compilation, the compiler maps all methods and data to different locations in memory to generate an executable file. While the program is running, the program accesses these methods and data through direct references, which are actual memory addresses or offsets.

For example, suppose there is a variable count in the program. When compiling the program, the compiler cannot determine the specific location of count in memory, so it records it in the symbol table. When the program is running, the operating system needs to resolve the symbolic reference count into a direct reference so that the program can correctly access the value of the count variable.

Static linking and dynamic linking

Static linking refers to the process of linking all the code and data needed in the program into an executable file during compilation. During the static linking process, the compiler copies the functions and variables in the static library into the executable file to form a complete executable file. When the program is running, all code and data exist in memory, and the program no longer needs to rely on external library files or dynamic link libraries.

The disadvantage of static linking is that the executable file is larger and difficult to maintain and update. For example, if you need to modify a function in the program, you need to recompile the entire program and then relink it to generate a new executable file. Such an operation is cumbersome and takes up a lot of time and resources.

Dynamic linking is a linking method relative to static linking. During the process of dynamic linking, the required code and data are added to the address space of the process only when the program is running, and different modules are combined together so that they can call each other. The advantage of dynamic linking is that it reduces the size of the program and also facilitates program updates and maintenance. Common dynamic link libraries (DLLs) are loaded using dynamic linking.

The process of dynamic linking is completed while the program is running. It replaces symbolic references with direct references so that the program can access methods and data correctly. Specifically, dynamic linking will put the code address of the method corresponding to the symbolic reference into the dynamic link in the stack frame, thereby realizing the conversion of symbolic reference to direct reference.

For example, suppose there is a static method in the program that needs to call a library function. In static linking, the compiler will copy the code of this library function into the executable file. In dynamic linking, the library function is loaded into memory when the program is running, and its code address is placed in the dynamic link in the stack frame to facilitate program calls.

In short, the parsing phase is a necessary step in program execution. By converting between symbolic references and direct references, programs can correctly access methods and data. Static linking and dynamic linking are different linking methods, and each method has its advantages and disadvantages. In actual development, it is necessary to choose a suitable link method according to the situation so that the program can be better run and maintained.

Initialization

Initialization refers to the process of allocating memory and assigning initial values to static variables of a class before using it. In Java, initialization of static variables is done during the class loading phase. Static variables include static member variables and static code blocks.

The initialization of static member variables can be assigned directly during declaration, or the assignment operation can be completed in a static code block. For example, if you declare a static variable a in a class, you can directly assign it a value of 12 when declaring it, as shown below:

public static final int a = 12;

In the preparation phase of the class loading phase, memory will be allocated for the static variable a and given a default value of 0. During the initialization phase, the value of a will be modified to 12 to meet the code requirements.

Static code blocks are also part of the initialization phase and can contain some code blocks that need to be executed when the class is loaded. The code in the static code block will be executed during the class loading phase and will only be executed once. For example, you can declare a static block of code that outputs a piece of information as follows:

static {<!-- -->
    System.out.println("Initialize static code block");
}

The static code block will be executed when the class is loaded and the corresponding information will be output.

A static variable in a class can also be an object that is instantiated when the class is loaded. For example, a static variable user is declared in a class and can be instantiated in a static code block as follows:

public static User user;

static {<!-- -->
    user = new User();
}

The code in the static code block will be executed when the class is loaded to instantiate the user object.

In short, initialization is a very important stage in Java. It allocates memory and assigns initial values to static variables. It can also execute some code blocks that need to be executed when the class is loaded to ensure the normal use of the class.

Use and uninstall

The last step of the Java class loading mechanism: use and unloading. This step is the end of the entire loading process. In this step, the Java virtual machine initializes the loaded class and then hands it over to the application for use. However, when these classes are no longer used by the application, the Java Virtual Machine unloads them to free up memory space.

“Use” here means that when an application needs to use a class, the Java virtual machine first checks whether the class has been loaded. If it has been loaded, it will be returned directly to the application for use; if it is not loaded, it will be loaded according to a certain loading process and handed over to the application for use. For example, when an application needs to use an instance of the java.util.Date class, the Java virtual machine first checks whether the class has been loaded. If not, it will load the class according to the loading process and return it to the application. use.

“Uninstalling” means that when the application no longer uses a class, the Java virtual machine will uninstall it. During this process, the Java virtual machine automatically determines whether the class has objects referenced by other classes. If not, the class will be unloaded from memory to free up memory space, thereby improving program performance and efficiency.

It is worth noting that the use and uninstallation of classes are not static, and many factors must be considered in specific situations. For example, if a class is frequently used, the Java virtual machine will not unload it to avoid the performance loss caused by frequently loading the class. In addition, the Java virtual machine does not necessarily uninstall a certain class immediately at a certain moment, but uninstalls it according to certain conditions and strategies to ensure the normal operation and stability of the program.

If you need to reprint or move this article, you are very welcome to send me a private message~

I hope that all readers will support bloggers who write articles with care. Times have changed now, information is exploding, and the alley is dark. Bloggers really need everyone’s help to continue to shine in this ocean, so hurry up. Move your little hands, click to follow, click to like, click to favorite, and even click to comment, these are the best support and encouragement for bloggers!

Blog homepage: I am Liao Zhiwei
Open source project: java_wxid
Bilibili: I am Liao Zhiwei
Personal Community: The boss behind the scenes
Personal WeChat ID: SeniorRD

Blogger’s life insights and goals

You cannot stop on the road of program development. If you stop, you will easily be eliminated. If you cannot endure the hardship of self-discipline, you will suffer from mediocrity. Only continuous ability can bring continuous self-confidence. I am a very ordinary programmer. Among the people, apart from my innate beauty, I am only 180cm tall. Even a person like me has been writing blog posts silently for many years.

There is an old saying that goes before being awesome, you have to be a fool to persevere. I hope that through a large number of works, time accumulation, personal charm, luck, and timing, you can create your own technical influence.

My heart is ups and downs, sometimes I am excited, sometimes I am pensive. I hope that I can become a comprehensive talent with superb skills in technology, business and management. I want to be the chief designer of the product architecture route, the leader of the team, the mainstay of the technical team, and a practical expert in corporate strategy and capital planning.

The realization of this goal requires unremitting efforts and continuous growth, but I must work hard to pursue it. Because I know that only by becoming such a talent can I continue to advance in my career and bring real value to the development of the company. In this ever-changing era, I must always be ready to face challenges, keep learning and exploring new areas in order to keep moving forward. I firmly believe that as long as I keep working hard, I will definitely achieve my goals.