[Java Object] Understand the true face of Java objects and pointer compression in one article

Article directory

Version and tool introduction
Java object structure
- Object header
- - mark word mark word
  - - mark word mark word analysis
    - Lock Record
  - class point class metadata pointer
- Instance data
- Align padding
- - Why alignment padding is needed
Common Java data type object analysis
- ArrayList
- Long
- String
- Byte
- Boolean
other
- pointer compression
- - Prerequisite knowledge: Why 32-bit operating systems support up to 4G of memory
  - From 32-bit operating system to 64-bit operating system
  - Pointer compression: use 4-byte pointers while getting larger memory
  - - How to turn on pointer compression
    - Implementation principle
think
- Why are mark word data fields not fixed and changing dynamically?
- Mark word is a field that changes dynamically. When acquiring a lock, where are fields such as hash code stored?
Personal profile

Introduction to versions and tools

JDK version: JDK 8
Java Object Analysis Maven Plugin

 <dependency>
        <groupId>org.openjdk.jol</groupId>
        <artifactId>jol-core</artifactId>
        <version>0.17</version>
    </dependency>

Java object structure

A Java object consists of three parts: object header, instance data, and alignment data. The object header is divided into mark word and class point metadata pointer.

Enterprise WeChat screenshot_16873473617131.png

jol-core is part of the Java Object Layout (JOL) library, a tool for analyzing the memory layout of Java objects. JOL allows us to deeply understand the internal structure of Java objects, including field offsets, sizes and layouts, as well as object header information, etc. This is useful for performance optimization and debugging, especially when we need to understand the layout of objects in memory.
How to print Java object information using jol-core

public class Test {<!-- -->
    static final A MUTEX = new A();

    public static void main(String[] args) {<!-- -->
        //Print JVM information
        System.out.println(VM.current().details());
        
        // Lazy loading of hashCode, generated and stored in the object header when calling the hashCode() method
        System.out.println(MUTEX.hashCode());
        System.out.println(ClassLayout.parseInstance(MUTEX).toPrintable());

        synchronized (MUTEX) {<!-- -->
            System.out.println(ClassLayout.parseInstance(MUTEX).toPrintable());
        }

        System.out.println(ClassLayout.parseInstance(MUTEX).toPrintable());
    }
}

class A {<!-- -->
    int a = 2;
}

//output
# VM mode: 64 bits
# Compressed references (oops): 3-bit shift
# Compressed class pointers: 3-bit shift
# Object alignment: 8 bytes
# ref, bool, byte, char, shrt, int, flt, lng, dbl
# Field sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8
# Array base offsets: 16, 16, 16, 16, 16, 16, 16, 16, 16

1407343478 // Object hashCode

concurrency.A object internals:
OFF SZ TYPE DESCRIPTION VALUE
  0 8 (object header: mark) 0x00000053e25b7601 (hash: 0x53e25b76; age: 0)
  8 4 (object header: class) 0xf800c143
 12 4 int A.a 2
Instance size: 16 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total

// 64-bit JVM mark word occupies 8 bytes
// The 64-bit JVM class point metadata pointer occupies 4 bytes (normally it should occupy 8 bytes, pointer compression is turned on here)
// Instance data int field occupies 4 bytes
// 16 bytes in total, 8-byte alignment by default, no padding required

concurrency.A object internals:
OFF SZ TYPE DESCRIPTION VALUE
  0 8 (object header: mark) 0x00000096d75ff7e8 (thin lock: 0x00000096d75ff7e8)
  8 4 (object header: class) 0xf800c143
 12 4 int A.a 2
Instance size: 16 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total

concurrency.A object internals:
OFF SZ TYPE DESCRIPTION VALUE
  0 8 (object header: mark) 0x00000053e25b7601 (hash: 0x53e25b76; age: 0)
  8 4 (object header: class) 0xf800c143
 12 4 int A.a 2
Instance size: 16 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total

Object header

The object header consists of two parts: mark word and class point.

mark word mark word

The mark word records the data information of the Java object when it is running, such as the lock held, whether it is a biased lock, the thread holding the lock, hashcode, generation age, etc. It occupies 4 bytes in a 32-bit JVM and occupies 4 bytes in a 64-bit JVM. Occupying 8 bytes, the specific fields are as follows:

Enterprise WeChat screenshot_16873116017337.png

mark word analysis of mark words

Supplementary knowledge:
Big-Endian: The high byte of data is stored in the low address, and the low byte of data is stored in the high address.
Little-Endian storage (Little-Endian): The high byte of data is stored in a high address, and the low byte of data is stored in a low address.

//The above example Mark word analysis JVM 64-bit
0x00000053e25b7601 (hash: 0x53e25b76; age: 0)

Hexadecimal number: 0x00000053e25b7601
Binary number: 0000 0000 0000 0000 0000 0000 0101 0011
           1110 0010 0101 1011 0111 0110 0000 0001

Lock mark: 01 No lock
Generational age: 0000 age:0
hashCode: 101 0011 1110 0010 0101 1011 0111 0110 = hash: 0x53e25b76 = decimal: 1407343478

0x00000096d75ff7e8 (thin lock: 0x00000096d75ff7e8)

Hexadecimal number: 0x00000096d75ff7e8
Binary number: 0000 0000 0000 0000 0000 0000 1001 0110
           1101 0111 0101 1111 1111 0111 1110 1000

Lock tag: 00 lightweight lock
Pointer to the thread stack Lock Record:
0000 0000 0000 0000 0000 0000 1001 0110 1101 0111 0101 1111 1111 0111 1110 10

Lock Record

The lock record holds the original value of the object’s mark word and also contains the metadata necessary to identify which object is locked.

class point class metadata pointer

The class point class metadata pointer points to the instanceKlass instance in the method area (the virtual machine uses this pointer to determine which class the object is an instance of). It occupies 4 bytes in a 32-bit JVM and 8 bytes or 4 words in a 64-bit JVM. section (pointer compression).

Instance data

Stores field information of an object. (Includes inherited fields)

Align padding

The size of Java objects is aligned to 8 bytes by default. When the size is not a multiple of 8, alignment padding needs to be performed. For example, 14 bytes needs to be padded to 16 bytes.

Why alignment padding is needed

Alignment filling is a scheme that trades space for time, which can improve memory access efficiency. Its essence is to use cache lines more efficiently.

Example:
CPU cache line (Cache Line) is the smallest storage unit of computer processor cache. Generally speaking, 32-bit systems are generally 4 bytes, and 64-bit systems are generally 8 bytes.

Enterprise WeChat screenshot_16904489558170.png

Pointer compression techniques also rely on Java object byte alignment.

Analysis of common Java data type objects

ArrayList

java.util.ArrayList object internals:
OFF SZ TYPE DESCRIPTION VALUE
  0 8 (object header: mark) 0x0000000000000001 (non-biasable; age: 0)
  8 4 (object header: class) 0xf8002f39
 12 4 int AbstractList.modCount 3
 16 4 int ArrayList.size 3
 20 4 java.lang.Object[] ArrayList.elementData [(object), (object), (object), null, null, null, null, null, null, null, null, null, null, null, null, null ]
Instance size: 24 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total

Long

java.lang.Long object internals:
OFF SZ TYPE DESCRIPTION VALUE
  0 8 (object header: mark) 0x0000000000000001 (non-biasable; age: 0)
  8 4 (object header: class) 0xf80022c0
 12 4 (alignment/padding gap)
 16 8 long Long. value 1
Instance size: 24 bytes
Space losses: 4 bytes internal + 0 bytes external = 4 bytes total

String

java.lang.String object internals:
OFF SZ TYPE DESCRIPTION VALUE
  0 8 (object header: mark) 0x0000000000000001 (non-biasable; age: 0)
  8 4 (object header: class) 0xf80002da
 12 4 char[] String.value [S, t, r, i, n, g]
 16 4 int String.hash 0
 20 4 (object alignment gap)
Instance size: 24 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

Byte

java.lang.Byte object internals:
OFF SZ TYPE DESCRIPTION VALUE
  0 8 (object header: mark) 0x0000000000000005 (biasable; age: 0)
  8 4 (object header: class) 0xf80021eb
 12 1 byte Byte.value 1
 13 3 (object alignment gap)
Instance size: 16 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total

Boolean

java.lang.Boolean object internals:
OFF SZ TYPE DESCRIPTION VALUE
  0 8 (object header: mark) 0x0000000000000005 (biasable; age: 0)
  8 4 (object header: class) 0xf8002097
 12 1 boolean Boolean.value true
 13 3 (object alignment gap)
Instance size: 16 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total

Others

Pointer compression

Prerequisite knowledge: Why 32-bit operating systems support up to 4G of memory

First look at an 8-byte memory:

If we need to address all the grids above: then we need 2^6 addresses, that is, a 6-bit operating system.

The same algorithm we calculate for 32-bit operating systems:
2^32 bit = 2^29 byte = 2^19 KB = 2^9 MB = 2^-1 GB = 0.5 GB

The actual value is 0.5G, but why is it said that 32-bit CPU supports up to 4G of memory?

In fact, the CPU treats 8 bits (1Byte) as a group, that is, the smallest read unit is 1 Byte, so 2^32 * 1 Byte = 4G

//In fact, the size of memory that can be used is determined by two aspects: hardware and operating system. The operating system refers to the virtual address level, and the hardware refers to the address bus.
// Other references: https://www.zhihu.com/question/22594254/answer/42967413

From 32-bit operating system to 64-bit operating system

From the above we know that the most memory used by the 32-bit operating system is 4G. As the programs we develop become more and more complex, the 32-bit operating system can no longer meet our memory needs. We have entered the era of the 64-bit operating system. We can use The memory reaches 4G * 2^32, but the pointer length also reaches 8 bytes. The too long pointer brings new problems:

1. Increased GC overhead: 64-bit object references need to occupy more heap space, leaving less space for other data, thus speeding up the occurrence of GC and performing GC more frequently.
2. Reduce cache hit rate: As 64-bit object references increase, fewer oops can be cached in memory, thus reducing cache efficiency.

Pointer compression: use 4-byte pointers while obtaining larger memory

How to enable pointer compression

-XX: + UseCompressedOops // Object pointer compression
-XX: + UseCompressedClassPointers // Class metadata pointer compression

// Enabled in the above example
# Compressed references (oops): 3-bit shift
# Compressed class pointers: 3-bit shift

// 64 JVM class point occupies 4 bytes
concurrency.A object internals:
OFF SZ TYPE DESCRIPTION VALUE
  0 8 (object header: mark) 0x00000053e25b7601 (hash: 0x53e25b76; age: 0)
  8 4 (object header: class) 0xf800c143
 12 4 int A.a 2
Instance size: 16 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total

Implementation principle

//Java objects in JVM are aligned to 8 bytes by default. The maximum heap memory is 32 GB (4G * 2^3). Pointer compression will fail if it exceeds 32 GB.
-XX:ObjectAlignmentInBytes

In the case of 8-byte alignment, the last three bits of the address are always 0:
  8 = 1000
 16 = 10000
 24 = 11000
 32 = 100000
 40 = 101000
 48 = 110000
 56 = 111000
 64 = 1000000
 72 = 1001000
 
 Therefore, when storing in a Java object, the three 0s are erased by shifting three bits to the right. When retrieving the value from the memory, the address in the Java object is left shifted by three bits to supplement the zeros, thus achieving the purpose of using 4 bytes to obtain 2 ^32 * 2^3 memory addresses, one memory address points to 1Byte, the total is 32G memory
 (This is also why we often see some articles saying that Java heap memory should not exceed 32G, because 4-byte pointers and 8-byte alignment cannot represent more than 32 GB of memory, and pointer compression will be turned off unless the number of aligned bytes is adjusted to expand accessible memory space).
 
 Set to 16-byte alignment: maximum heap memory 64 GB (4G * 2^4), pointer compression will fail if it exceeds 64 GB
 16 = 10000
 32 = 100000
 48 = 110000
 64 = 1000000

Thinking

Why the mark word data field is not fixed and changes dynamically

It supports object lock concurrency and lock optimization without increasing the memory footprint of the object.

Mark word is a field that changes dynamically. When the lock is acquired, where is the hash code and other fields stored?

If the HotSpot VM is a biased lock, it does not acquire the hash code. If it has acquired the hash code, it does not acquire the biased lock but directly acquires a lightweight lock (if it is a biased lock, acquiring the hash code at this time will expand into a heavyweight lock). ), the hash code is stored in the Lock Record for lightweight locks, and the hash code is stored on the ObjectMonitor object for heavyweight locks.
Note: The hash codes discussed here are only for identity hash codes. The hash code generated by the user-defined hashCode() method will not be placed in the object header. (Identity hash code is the value returned by java.lang.Object.hashCode() or java.lang.System.identityHashCode(Object) that has not been overridden.)
Refer to Big R’s answer: https://www.zhihu.com/question/52116998/answer/133400077

Personal profile

Hello, I am Lorin Lorin, a Java back-end technology developer! Motto:Technology has the power to make the world a better place.

My passion for technology is my motivation to continue learning and sharing. My blog is a place about the Java ecosystem, backend development, and the latest technology trends.

As a Java backend technology enthusiast, I am not only passionate about exploring new features of the language and the depth of technology, but also passionate about sharing my insights and best practices. I believe the sharing of knowledge and community collaboration can help us grow together.

On my blog, you will find in-depth articles about Java core concepts, JVM underlying technology, commonly used frameworks such as Spring and Mybatis, database management such as MySQL, message middleware such as RabbitMQ and Rocketmq, performance optimization, etc. I will also share some programming tips and problem-solving methods to help you better master Java programming.

I encourage interaction and community building, so please leave your questions, suggestions, or topic requests and let me know what interests you. Additionally, I will share the latest internet and technology news to ensure you stay connected with the latest developments in the technology world. I look forward to moving forward on the road of technology with you and exploring the infinite possibilities of the technology world.

Stay tuned to my blog and let us pursue technical excellence together.