Go｜map underlying implementation, expansion rules, features

Article directory

- 1. Hash table
- 2. The underlying implementation of Go map
- - hmap
  - bmap
  - mapextra with overflow bucket
- 3. Expansion rules
- - Double the capacity
  - Equivalent expansion
- 4. Other features
- - map traverses unordered
  - map is not thread safe

1. Hash table

The hash table is used to store key-value pairs, and the key-value pairs are hashed into buckets through the hash function.

Go uses AND operation, the number of buckets is m, then the number is [0, m-1], and the hash value of the key is ANDed with m-1. To ensure that all buckets will be selected, m must be an integer power of 2. In this way, the binary representation of m must have only one bit as 1, and the binary representation of m-1 must have all bits below this bit as 1. The expansion rules below have detailed examples.

m=4 (00000100)
m-1 (00000011)

If the number of buckets is not an integer power of 2, some buckets may never be selected:

m=5 (00000101)
m-1 (00000100)

Then [1, 3] is bound to be an empty bucket.

Load factor = count / number of buckets

2. The underlying implementation of Go map

hmap

Golang’s map uses a hash table as the underlying implementation. map is actually a pointer to the hmap structure.

type hmap struct {<!-- -->
  count int // number of stored key-value pairs
  flags uint8 // status flag (whether it is being written, etc.)
  B uint8 // number of buckets 2^B
  noverflow uint16 // number of overflow buckets used
  hash0 uint32 // Random number seed for generating hash

  buckets unsafe.Pointer // Bucket array pointer, the size of the array is 2^B (buckets)
  oldbuckets unsafe.Pointer // During the expansion phase, it is used to record the addresses of the overflow buckets used by the old buckets
  nevacuate uintptr // Record the number of the old bucket to be migrated in the next stage of gradual expansion
  extra *mapextra // points to the mapextra structure and records information related to the overflow bucket
}

$[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture Save it and upload it directly (img-drFDj0hv-1679739284510) (https://secure2.wostatic.cn/static/4N4E41mv9McHYy744nwBJp/image.png?auth_key=1679738766-tQHx2taUSL4V6Jxmmunk4-0-456333c\fd46)3$

bmap

buckets is a pointer to the hash table node bmap, that is, a bucket. In Go, a bucket can hold up to 8 keys.

The lower 8 bits of the hash value are used to locate the bucket, and the upper 8 bits are used to locate the tophash.

type bmap struct {<!-- -->
    tophash [bucketCnt]uint8
    // An array with len of 8, used to quickly locate whether the key is in this bmap
    // A bucket has a maximum of 8 slots. If the tophash value of the key is in the tophash, it means that the key is in this bucket
}

The bmap structure above is a static structure, and runtime.bmap will be expanded into the following structure during compilation:

type bmap struct{<!-- -->
    topbits[8]uint8
    keys [8]keytype
    values [8] valuetype
    pad uintptr // used for memory alignment, may not be needed
    overflow uintptr // when the 8 keys of the bucket are full
    // overflow points to the next overflow bucket bmap,
    // overflow is uintptr instead of *bmap type, to ensure that bmap does not contain pointers at all, in order to reduce gc, the overflow bucket is stored in the extra field
}

$[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture Save it and upload it directly (img-65lmSojX-1679739284512)(https://secure2.wostatic.cn/static/iteHi1eLngSy2XYQPoAXMr/image.png?auth_key=1679738766-n5hnHgS6Xm5MpXv97B1XK2-0-7c355ac43d98ecced41) <ul> <li> tophash: It is an array with a length of 8. When a key with the same low bit of the hash value is stored in the current bucket, the high 8 bits of the hash value will be stored in the array to facilitate subsequent matching. <center></center> <ul> <li>The tophash field not only stores the high 8 bits of the key hash value, but also stores some status values to indicate the status of the current bucket unit, and these status values are all smaller than minTopHash. In order to avoid confusion that the high 8-bit value of the key hash value is equal to these status values, when the high 8-bit value of the key hash value is less than minTopHash, its value is automatically added to minTopHash as the key’s tophash.</li> </ul> </li> </ul> <pre>emptyRest = 0 // indicates that this bucket unit is empty, and the higher index units are also empty emptyOne = 1 // indicates that the bucket unit is empty evacuatedX = 2 // Used to represent the expansion and migration to the first half of the new bucket evacuatedY = 3 // It is used to indicate the expansion and migration to the second half interval of the new bucket evacuatedEmpty = 4 // used to indicate that this unit has been evacuated minTopHash = 5 // The value of the dividing line between the key's tophash value and the bucket status value. If it is less than this value, it must represent the state of the bucket unit. If it is greater than this value, it must be the tophash value corresponding to the key. func tophash(hash uintptr) uint8 { top := uint8(hash >> (goarch.PtrSize*8 - 8)) if top < minTopHash { top + = minTopHash } return top } </pre> <ul> <li> 8 key-value pairs can be placed in a bucket, but in order to make the memory arrangement more compact, 8 keys are put together, and 8 values are put together, and there are 8 tophashes in front of the 8 keys, each tophash is the upper 8 bits of the corresponding hash value. <blockquote> When the key and value types are different, the key and value occupy different byte sizes. Using the key/value format may waste memory space due to memory alignment. </blockquote> </li> <li> overflow: Points to an overflow bucket. The layout of the overflow bucket is the same as that of the regular bucket, which is introduced to reduce the number of expansions (that is, the zipper method of hash collisions). When a bucket is full and there is an overflow bucket available, an overflow bucket will be chained behind the bucket to continue storing in it. </li> </ul> <h3>mapextra and overflow bucket</h3> If the number of buckets to be allocated in the hash table is greater than **** 2 4 2^4 24**power, it is considered that the probability of using the overflow bucket is relatively high, and it will pre-allocate 2 ( B ? 4 ) 2^{(B-4)} 2(B? 4) overflow buckets for backup **, these overflow buckets and regular buckets are continuous in memory, only the former 2 B 2^B 2B are used as regular buckets. <code>hmap</code> has <code>extra</code> field at the end, which points to the <code>mapextra</code> structure, which records information related to the overflow bucket. <pre>type mapextra struct { overflow *[]*bmap // record the address of the used overflow bucket oldoverflow *[]*bmap // The address of the overflow bucket used by the old bucket during the expansion phase nextOverflow *bmap // point to the address of the next free overflow bucket } </pre> As shown in the figure below, the number of allocated buckets is 2 5 = 32 2^5 = 32 25=32, then the number of spare overflow buckets is 2 ( 5 ? 4 ) = 2 2^{(5-4)} = 2 2(5?4)=2. <ul> <li>At this time, the <code>bmap</code> bucket numbered 2 is full, and <code>overflow</code> points to the address of the next overflow bucket, which points to number 32 here.</li> <li><code>noverflow</code> in <code>hmap</code> indicates the number of overflow buckets used, which is 1 here. The <code>extra</code> field points to the <code>mapextra</code> structure of the record overflow bucket.</li> <li><code>nextOverflow</code> in <code>mapextra</code> points to the next free overflow bucket number 33.</li> </ul> <img src=$

3. Expansion rules

Use gradual expansion when expanding the map.

Since the expansion of the map needs to relocate the original key/value to a new memory address, if the map stores hundreds of millions of key-values, one-time relocation will cause a relatively large delay, so the expansion of the Go map adopts A method called **”progressive”, the original key will not be relocated at one time, only 2 buckets at most will be relocated each time. Only when a key is inserted, modified, or deleted, it will try to relocate the buckets**. First check whether oldbuckets has been relocated, specifically check whether oldbuckets is nil.

Double expansion

count/(2^B) > 6.5: When the load factor exceeds 6.5, the double expansion will be triggered.

As shown in the figure below, it turns out that B = 0, and there is only one bucket. When it is full, it triggers double expansion. B = 1, buckets points to two new buckets, and oldbuckets points to the old bucket. nevacuate indicates that the old bucket numbered 0 will be migrated next. The key-value pairs of the old bucket will be gradually distributed to the two new buckets. Delete oldbuckets until all the key-value pairs in the old bucket are relocated.

$[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture Save it and upload it directly (img-05Tud7sh-1679739284517) (https://secure2.wostatic.cn/static/jsghxQoZiXG2E65sKoDhpX/image.png?auth_key=1679737351-tdYJmjNTVkf2dsrG5UmArP-0-63b65d57ea4798cf86e) During the migration process, use the AND algorithm <code>hash & amp; (m-1)</code> to migrate the old bucket to the new bucket, and use the hash value of the old bucket and the number of expanded buckets m- The value of 1 is ANDed ( & amp; ), and the result depends on the position. If the number of old buckets is 4, then the number of new buckets will be 8. If a hash value selects the old bucket No. 0, then the binary lower two bits of the hash value must be 0. <blockquote> Old barrel m-1 = 3 = 00000011, select the old barrel No. 0, indicating that the hash value is xxxxxx00, 00000011 & xxxxxx00 = 0 </blockquote> So there are only two results for selecting a new bucket, depending on whether the third bit of the hash value is 0 or 1. <blockquote> New bucket m-1 = 7 = 00000111, AND operation with the original hash value, if the third bit is 0, then it is 0, and if the third bit is 1, then it is 00000100 = 4. </blockquote> <h3>Equivalent expansion</h3> Although the load factor limit is not exceeded, if too many overflow buckets are used, it will trigger Equivalent expansion, create new buckets with the same number of old buckets, and then migrate the original key-value pairs to the new buckets middle. If the number of regular buckets is less than or equal to 2 15 2^{15} 215 , the number of overflow buckets used is greater than the number of regular buckets 2 B 2^B 2B is too much. <code>B <= 15, nooverflow >= 2^B</code> If the number of regular buckets is greater than 2 15 2^{15} 215 , use an overflow bucket larger than 2 15 2^{15} 215 is too much. <code>B > 15, nooverflow >= 2^15</code> It usually happens when many key-value pairs are deleted, which will increase the number of overflow buckets, but the load factor is not high. For the same number of key-value pairs, migrating to a new bucket will rearrange the loose key-value pairs to make them more compact, thereby ensuring faster access. This is the meaning of equivalent expansion. <h2>4. Other features</h2> <h3>map traverses unordered</h3> When using range to traverse the map multiple times, the order of the output key and value may be different. This is intentionally by the designers of the Go language to remind developers that the underlying implementation of Go does not guarantee the stability of the map traversal order, please do not rely on range traversal result order. There are 2 main reasons: <ul> <li>When traversing the map, it does not traverse from the fixed bucket 0. Each time it traverses, it will start from a bucket with a random value sequence number, and then from it a random cell start traversing</li> <li>When traversing the map, it traverses the buckets in order, and at the same time traverses the cells in the bucket and its overflow bucket on demand. However, map will relocate keys after capacity expansion, which causes the key that originally fell into one bucket to fall into other buckets after relocation. From this perspective, traversing the map The results cannot be in the original order.</li> </ul> The map itself is unordered, and the order will be randomized when traversing. If you want to traverse the map sequentially, you need to sort the map keys first, and then traverse the map in the order of the keys. <h3>map is not thread-safe</h3> Go officials believe that Go map should be more suitable for typical usage scenarios (safe access from multiple goroutines is not required), rather than for a small number of situations (concurrent access), which will cause most programs to pay the cost of locking (performance), which determines Not supported, if the map is read and written concurrently, an error will be reported directly. The official recommendation is to use a read-write lock on the map, an anonymous structure (struct) body, including a native and an embedded read-write lock <code>sync.RWMutex</code>: <pre>var counter = struct{ sync.RWMutex m map[string]int }{m: make(map[string]int)} counter. RLock() n := counter.m[$
When the amount of data in the map is very large, only one lock will be inefficient, and the logic of the partition is complicated when it is locked. sync.Map supported by Go1.9, which supports concurrent reading and writing of map. The “space for time” mechanism is adopted, and two data structures are redundant, namely: read and dirty, to reduce the impact of locking on performance.

type Map struct { mu Mutex read atomic. Value // readOnly dirty map[interface{}]*entry misses int }

It is specially designed for append-only scenarios, that is, it is suitable for scenes with more reading and less writing. If multiple writes, the performance will drop sharply.