Shallow reading of JDK17 source code – StringBuilder

Foreword

The 24th master of soft engineering, my level: 0 offer, not guaranteed to be correct, the content of this article is only my personal understanding.

Update ing…

Table of Contents

  • foreword
  • 1. Basic Information & amp; Construction Method
  • 2. append()
  • 2. Expansion

1. Basic information & amp; construction method

StringBuilder sb = new StringBuilder();

First look at the default constructor:

@IntrinsicCandidate
    public StringBuilder() {<!-- -->
        super(16);
    }
// public final class StringBuilder extends AbstractStringBuilder

The default construction method directly calls the construction method of the parent class, and passes in the default length 16. The StringBuilder class inherits the AbstractStringBuilder class. The implementation class of this abstract class also has StringBuilder, a thread-safe StringBuilder

Take a look at the properties of the abstract class, including:

  • value: The stored value is a byte array
  • coder: encoding type, represented by a static constant
    • @Native static final byte LATIN1 = 0;
    • @Native static final byte UTF16 = 1;
  • count: the length of the string
  • EMPTYVALUE: A default empty value, which is used in the no-argument construction method of the abstract class. But this no-argument construction method is not called, and the source code comment is: serialization of subclasses is necessary. (What’s the use here, the author doesn’t really understand, hee hee)
abstract class AbstractStringBuilder implements Appendable, CharSequence {<!-- -->
    byte[] value;
    byte coder;
    int count;
    private static final byte[] EMPTYVALUE = new byte[0];
}

Let’s take a look at the structure with parameters again. It is commonly used to pass in strings directly during direct initialization.

StringBuilder str= new StringBuilder("SWPU");

Similarly, the parametric constructor of the parent class is called

@IntrinsicCandidate
    public StringBuilder(String str) {<!-- -->
        super(str);
    }
AbstractStringBuilder(String str) {<!-- -->
// Get the length length
    int length = str. length();
    // The capacity is reserved for a length of 16, if it exceeds the maximum value, the maximum value will be taken
    int capacity = (length < Integer. MAX_VALUE - 16)
            ? length + 16 : Integer.MAX_VALUE;
    // coder() Gets the encoding form of the incoming string, if string compression is disabled, use utf16 encoding
    final byte initCoder = str. coder();
    // The encoding of StringBuilder changes with the encoding of the incoming str
    coder = initCoder;
    // If the encoding is UTF16, new a byte[] twice the size. Because UTF16 characters occupy 2 bytes
    value = (initCoder == LATIN1)
            ? new byte[capacity] : StringUTF16.newBytesFor(capacity);
    // Call the append method to initialize
    append(str);
}

2. append()

/**
append()
*/
public AbstractStringBuilder append(String str) {<!-- -->
// If it is an empty string, it will append a string "null" with a length of 4
        if (str == null) {<!-- -->
            return appendNull();
        }
        // Get the length of the string to add
        int len = str. length();
        // Check whether the capacity is qualified after adding the string
        ensureCapacityInternal(count + len);
        putStringAt(count, str);
        count += len;
        return this;
    }

What is the ensureCapacityInternal() method here for?
The value attribute is an array of byte type. When initializing value, 16 bytes of space are reserved, and value.length is actually much larger than count. The count attribute is used to record the actual byte size occupied.
Verify it:

swpu: encoding is LATIN 20 = 4 + 16
swpu2: encoding is UTF16 44 = (6 + 16) * 2 = (6 + 16) << 1
Therefore, every operation involving capacity will perform a bit operation according to the coder, because LATIN = 0, so the bit operation does not change the size. (This code is written really silky)
For ease of understanding, use LATIN encoding as an example. Then the logic of the following method is easy to understand.
First, get oldCapacity, that is, the original capacity, which reserved 16 positions for us; then the parameter minimumCapacity = count + str.length() passed in is the capacity required by adding the new string. If it exceeds the capacity, expand the capacity, otherwise nothing happens.

private void ensureCapacityInternal(int minimumCapacity) {<!-- -->
        // overflow-conscious code
        // LATIN: 0; UTF16:1
        // If it is utf16 encoding, the capacity is length/2
        int oldCapacity = value. length >> coder;
        // Determine whether expansion is required
        if (minimumCapacity - oldCapacity > 0) {<!-- -->
            value = Arrays. copyOf(value,
                    newCapacity(minimumCapacity) << coder);
        }
    }

After the expansion is completed or no expansion is required, execute putStringAt(count, str)

 private void putStringAt(int index, String str) {<!-- -->
        putStringAt(index, str, 0, str. length());
    }
    // Pass in starting point, string, offset, end position
    private void putStringAt(int index, String str, int off, int end) {<!-- -->
        if (getCoder() != str. coder()) {<!-- -->
        // If the latin encoding encounters UTF16, convert the original to UTF16 2-byte storage, UTF16 is backward compatible
            inflate();
        }
        str.getBytes(value, off, index, coder, end - off);
    }
    
    void getBytes(byte[] dst, int srcPos, int dstBegin, byte coder, int length) {<!-- -->
        // Under normal circumstances, use System.arraycopy() to realize the assignment of value
        if (coder() == coder) {<!-- -->
            System.arraycopy(value, srcPos << coder, dst, dstBegin << coder, length << coder);
        } else {<!-- --> // this.coder == LATIN & amp; & amp; coder == UTF16
            StringLatin1.inflate(value, srcPos, dst, dstBegin, length);
        }
    }

It can be seen that the bottom layer of append() actually calls System.arraycopy(), and this method is actually the bottom layer call of Arrays.copyof()

After execution, update the count attribute, that is, the real length of the string stored in the current value, which is separate from the length of value, and the append() method ends.

2. Expansion

In the append() method, one step is to check the capacity, if the capacity does not exceed the reserved size n + 16, nothing happens, where n is the length of the initialized string. So what happens if capacity is exceeded?
Back to the method ensureCapacityInternal():

private void ensureCapacityInternal(int minimumCapacity) {<!-- -->
        // overflow-conscious code
        int oldCapacity = value. length >> coder;
        if (minimumCapacity - oldCapacity > 0) {<!-- -->
            value = Arrays. copyOf(value,
                    newCapacity(minimumCapacity) << coder);
        }
    }

What happens if, say, we now join a string of length 20?

StringBuilder swpu = new StringBuilder("SWPU");
swpu.append("SWPU_SWPU_SWPU_SWPU!");


The answer is: 20 -> 42
The expansion method is as follows:

private int newCapacity(int minCapacity) {<!-- -->
// 20
        int oldLength = value. length;
        // 4 + 20 (swpu + ...)
        int newLength = minCapacity << coder;
        // 4 (exceeding part)
        int growth = newLength - oldLength;
        //
        int length = ArraysSupport. newLength(oldLength, growth, oldLength + (2 << coder));
        if (length == Integer.MAX_VALUE) {<!-- -->
            throw new OutOfMemoryError("Required length exceeds implementation limit");
        }
        return length >> coder;
    }

It is worth noting that the following line of code

ArraysSupport.newLength(oldLength, growth, oldLength + (2 << coder));
// oldLength: original capacity
// minGrowth: minimum expansion capacity
// prefGrowth: expected expansion
// 20, 4, 22
public static int newLength(int oldLength, int minGrowth, int prefGrowth){<!-- -->
...
// 20 + MAX(4,22) = 42
int prefLength = oldLength + Math.max(minGrowth, prefGrowth)
...
}

Therefore, each time the capacity is expanded, Original capacity*2 + 2 is used as the new capacity. As for why the variable minGrowth is needed, I guess it is to prevent the capacity from multiplying by 2 to exceed the maximum value during expansion.