Article directory
-
- 1. Hash table
- 2. The underlying implementation of Go map
-
- hmap
- bmap
- mapextra with overflow bucket
- 3. Expansion rules
-
- Double the capacity
- Equivalent expansion
- 4. Other features
-
- map traverses unordered
- map is not thread safe
1. Hash table
The hash table is used to store key-value pairs, and the key-value pairs are hashed into buckets through the hash function.
Go uses AND operation, the number of buckets is m, then the number is [0, m-1], and the hash value of the key is ANDed with m-1. To ensure that all buckets will be selected, m must be an integer power of 2. In this way, the binary representation of m must have only one bit as 1, and the binary representation of m-1 must have all bits below this bit as 1. The expansion rules below have detailed examples.
- m=4 (00000100)
- m-1 (00000011)
If the number of buckets is not an integer power of 2, some buckets may never be selected:
- m=5 (00000101)
- m-1 (00000100)
Then [1, 3] is bound to be an empty bucket.
Load factor = count / number of buckets
2. The underlying implementation of Go map
hmap
Golang’s map uses a hash table as the underlying implementation. map is actually a pointer to the hmap structure.
type hmap struct {<!-- --> count int // number of stored key-value pairs flags uint8 // status flag (whether it is being written, etc.) B uint8 // number of buckets 2^B noverflow uint16 // number of overflow buckets used hash0 uint32 // Random number seed for generating hash buckets unsafe.Pointer // Bucket array pointer, the size of the array is 2^B (buckets) oldbuckets unsafe.Pointer // During the expansion phase, it is used to record the addresses of the overflow buckets used by the old buckets nevacuate uintptr // Record the number of the old bucket to be migrated in the next stage of gradual expansion extra *mapextra // points to the mapextra structure and records information related to the overflow bucket }
bmap
buckets
is a pointer to the hash table node bmap
, that is, a bucket. In Go, a bucket can hold up to 8 keys.
The lower 8 bits of the hash value are used to locate the bucket, and the upper 8 bits are used to locate the tophash.
type bmap struct {<!-- --> tophash [bucketCnt]uint8 // An array with len of 8, used to quickly locate whether the key is in this bmap // A bucket has a maximum of 8 slots. If the tophash value of the key is in the tophash, it means that the key is in this bucket }
The bmap structure above is a static structure, and runtime.bmap
will be expanded into the following structure during compilation:
type bmap struct{<!-- --> topbits[8]uint8 keys [8]keytype values [8] valuetype pad uintptr // used for memory alignment, may not be needed overflow uintptr // when the 8 keys of the bucket are full // overflow points to the next overflow bucket bmap, // overflow is uintptr instead of *bmap type, to ensure that bmap does not contain pointers at all, in order to reduce gc, the overflow bucket is stored in the extra field }
3. Expansion rules
Use gradual expansion when expanding the map.
Since the expansion of the map needs to relocate the original key/value to a new memory address, if the map stores hundreds of millions of key-values, one-time relocation will cause a relatively large delay, so the expansion of the Go map adopts A method called **”progressive”, the original key will not be relocated at one time, only 2 buckets at most will be relocated each time. Only when a key is inserted, modified, or deleted, it will try to relocate the buckets**. First check whether oldbuckets has been relocated, specifically check whether oldbuckets is nil.
Double expansion
count/(2^B) > 6.5: When the load factor exceeds 6.5, the double expansion will be triggered.
As shown in the figure below, it turns out that B = 0, and there is only one bucket. When it is full, it triggers double expansion. B = 1, buckets
points to two new buckets, and oldbuckets
points to the old bucket. nevacuate
indicates that the old bucket numbered 0 will be migrated next. The key-value pairs of the old bucket will be gradually distributed to the two new buckets. Delete oldbuckets until all the key-value pairs in the old bucket are relocated.
When the amount of data in the map is very large, only one lock will be inefficient, and the logic of the partition is complicated when it is locked. sync.Map
supported by Go1.9, which supports concurrent reading and writing of map. The “space for time” mechanism is adopted, and two data structures are redundant, namely: read and dirty, to reduce the impact of locking on performance.
type Map struct {<!-- --> mu Mutex read atomic. Value // readOnly dirty map[interface{<!-- -->}]*entry misses int }
It is specially designed for append-only
scenarios, that is, it is suitable for scenes with more reading and less writing. If multiple writes, the performance will drop sharply.