Caching in postgreSQL

1. Introduction to cache

?As shown in the figure below, when a postgreSQL process reads a tuple, it needs to obtain the basic information of the table (such as the oid, index information and statistical information of the table) and the schema information of the tuple. This information is recorded in in multiple system tables. Usually the schema information of a table changes very rarely after being set. Therefore, when operating on multiple tuples of the same table, it is obviously unnecessary to read the tuples of the system table every time to build the schema information. , which will also reduce the efficiency of tuple operations. In order to reduce access to system tables, two caches are set up in the local memory area domain of each process, one is used to store tuples of system tables, and the other is The first type is used to store the basic information of the table, so that the process can quickly construct the basic information of the table and the schema information of the tuple. In order for other backend processes to be able to detect when a certain process changes the system table, a mechanism for maintaining cache consistency is required, which is PG’s InvalidMessage mechanism.

How the user table is managed, reference: https://zhuanlan.zhihu.com/p/623283855

2. SysCache

syscache is mainly used to cache tuples of recently used system tables. From the perspective of code implementation, syscache is a catcache array. The length of the array is the number of system tables. Each system table uniquely corresponds to an element of the catcache array.

  • catcache data structure
typedef struct catcache
{
int id; /* catcache id */
int cc_nbuckets; /* # of hash buckets in this cache */
TupleDesc cc_tupdesc; /* tuple descriptor (copied from reldesc) */
dlist_head *cc_bucket; /* hash buckets */
CCHashFN cc_hashfunc[CATCACHE_MAXKEYS]; /* hash function for each key */
CCFastEqualFN cc_fastequal[CATCACHE_MAXKEYS]; /* fast equal function for
* each key */
int cc_keyno[CATCACHE_MAXKEYS]; /* AttrNumber of each key */
dlist_head cc_lists; /* list of CatCList structs */
int cc_ntup; /* # of tuples currently in this cache */
int cc_nkeys; /* # of keys (1..CATCACHE_MAXKEYS) */
const char *cc_relname; /* name of relation the tuples come from */
Oid cc_reloid; /* OID of relation the tuples come from */
Oid cc_indexoid; /* OID of index matching cache keys */
bool cc_relisshared; /* is relation shared across databases? */
slist_node cc_next; /* list link */
ScanKeyData cc_skey[CATCACHE_MAXKEYS]; /* precomputed key info for heap
* scans */

/*
* Keep these at the end, so that compiling catcache.c with CATCACHE_STATS
* doesn't break ABI for other modules
*/
#ifdef CATCACHE_STATS
long cc_searches; /* total # searches against this cache */
long cc_hits; /* # of matches against existing entry */
long cc_neg_hits; /* # of matches against negative entry */
long cc_newloads; /* # of successful loads of new entry */

/*
* cc_searches - (cc_hits + cc_neg_hits + cc_newloads) is number of failed
* searches, each of which will result in loading a negative entry
*/
long cc_invals; /* # of entries invalidated from cache */
long cc_lsearches; /* total # list-searches */
long cc_lhits; /* # of matches against existing lists */
#endif
}CatCache;
2.1 syscache initialization

When the postgres process is initialized, syscache will be initialized, and key information for finding system table tuples will be written into the elements of the catcache array.

The data structures involved are as follows:

  • cacheinfo: stores the catcache description information of all system tables

    struct cachedesc
    {
    Oid reloid; /* OID of the relation being cached */
    Oid indoid; /* OID of index relation for this cache */
    int reloidattr; /* attr number of rel OID reference, or 0 */
    int nkeys; /* # of keys needed for cache lookup */
    int key[4]; /* attribute numbers of key attrs */
    int nbuckets; /* number of hash buckets for this cache */
    };
    
    static const struct cachedesc cacheinfo[] = {
    {AggregateRelationId, /* AGGFNOID */
    AggregateFnoidIndexId,
    \t\t1,
    {
    Anum_pg_aggregate_aggfnoid,
    0,
    0,
    0
    },
    16
    },
    ...
    \t
    }
    
  • catcacheheader: catcache uses the cc_next field to form a one-way linked list, and the header is recorded using this structure.

    typedef struct catcacheheader
    {
    slist_head ch_caches; /* head of list of CatCache structs */
    int ch_ntup; /* # of tuples in all caches */
    } CatCacheHeader;
    

Initialization phase 1: Initialize the catcache array using cacheinfo

typedef struct catcache
{
...
TupleDesc cc_tupdesc; /* tuple descriptor (copied from reldesc) */
int cc_nbuckets; /* # of hash buckets in this cache */
dlist_head *cc_bucket; /* hash buckets */
int cc_keyno[CATCACHE_MAXKEYS]; /* AttrNumber of each key */
int cc_nkeys; /* # of keys (1..CATCACHE_MAXKEYS) */
Oid cc_reloid; /* OID of relation the tuples come from */
Oid cc_indexoid; /* OID of index matching cache keys */
...
}

Initialization phase 2: Fill in the tuple description information (cc_tupdesc), system table name (cc_relname) and search keyword related fields in catcache according to the corresponding system table

typedef struct catcache
{
...
CCHashFN cc_hashfunc[CATCACHE_MAXKEYS]; /* hash function for each key */
CCFastEqualFN cc_fastequal[CATCACHE_MAXKEYS]; /* fast equal function for each key */
const char *cc_relname; /* name of relation the tuples come from */
bool cc_relisshared; /* is relation shared across databases? */
ScanKeyData cc_skey[CATCACHE_MAXKEYS]; /* precomputed key info for heap scans */
...
}
2.2 Organization of cache tuples in catcache

The cc_bucket array in each catcache element is a Hash bucket array, and the key value of the tuple can be mapped to the subscript of the cc_bucket array through the hash function. Each hash bucket is organized into a doubly linked list Dllist, in which the nodes are of type Dlelem. Dlelem is a wrapped cache tuple, and its dle_val field points to a cache tuple in the form of CatCTup.

The cache tuples in CatCache will first be packaged into CatCTup form, then packaged into Dlelem form, and finally added to the hash bucket list where they are located.

typedef struct dlist_node dlist_node;
struct dlist_node
{
dlist_node *prev;
dlist_node *next;
};
typedef struct catctup
{
int ct_magic; /* for identifying CatCTup entries */
#define CT_MAGIC 0x57261502
uint32 hash_value; /* hash value for this tuple's keys */
Datum keys[CATCACHE_MAXKEYS];
dlist_node cache_elem; /* list member of per-bucket list */
int refcount; /* number of active references */
bool dead; /* dead but not yet removed? Mark for deletion*/
bool negative; /* negative cache entry? represents a tuple that does not actually exist*/
HeapTupleData tuple; /* tuple management header */
struct catclist *c_list; /* containing CatCList, or NULL if none */

CatCache *my_cache; /* link to owning catcache */
} CatCTup;
2.3 Find tuples in catcache

There are two ways to find tuples in catcache: exact match and partial match.

  1. exact match

? Exact matching is implemented by the SearchCatCache function:

HeapTuple
SearchCatCache(CatCache *cache,
Datum v1,
Datum v2,
Datum v3,
Datum v4);
  • First, traverse the catcacheheader linked list and find the catcache element corresponding to the system table according to the system table name or oid.
  • Find the tuple key value and hash it, and find the corresponding hash bucket subscript of catcache in the cc_bucket array based on the hash value.
  • Traverse the hash bucket linked list, find the Dlelem that meets the requirements, and force the dle_val in its structure to the CatCTup type. The HeapTupleData in CatCTup is the head of the tuple to be found.
  • Move the Dlelem to the head of the hash bucket list, and increase the cc_hits of catcache by 1.
  • If no tuple satisfying the conditions is found in the hash bucket list, you need to further scan the physical system table:
    • If a tuple is found in the physical system table, the tuple is packaged into Dlelem and added to the head of the hash bucket list;
    • Otherwise, it means that the tuple does not exist, construct a “negative tuple”, wrap it, and add it to the head of the hash bucket list.

? 2. Partial match

Partial matching is implemented by SearchCatCacheList:

SearchCatCacheList(CatCache *cache,
int nkeys,
Datum v1,
Datum v2,
Datum v3)

This function returns a CatCList data structure, and all returned results are managed in a linked list.

typedef struct catclist
{
int cl_magic; /* for identifying CatCList entries */
#define CL_MAGIC 0x52765103

uint32 hash_value; /* hash value for lookup keys */

dlist_node cache_elem; /* list member of per-catcache list */

/*
* Lookup keys for the entry, with the first nkeys elements being valid.
* All by-reference are allocated separately.
*/
Datum keys[CATCACHE_MAXKEYS];

int refcount; /* number of active references */
bool dead; /* dead but not yet removed? */
bool ordered; /* members listed in index order? */
short nkeys; /* number of lookup keys specified */
int n_members; /* number of member tuples */
CatCache *my_cache; /* link to owning catcache */
CatCTup *members[FLEXIBLE_ARRAY_MEMBER]; /* members */
} CatCList;

Search process:

3. RelCache

RelCache stores not tuples, but RelationData data. Each RelationData structure represents the schema information of a table. This information is constructed from the information in the system table tuples.

typedef struct RelationData
{
RelFileNode rd_node; /* relation physical identifier */
struct SMgrRelationData *rd_smgr; /* file handle of the table */
. ...
Form_pg_class rd_rel; /* Table information in the corresponding tuple in the pg_class system table */
TupleDesc rd_att; /* Tuple descriptor of the table, describing each attribute of the table */
Oid rd_id; /* relation's object id */
List *rd_indexlist; /* list of OIDs of indexes on relation */
Bitmapset *rd_indexattr; /* identifies columns used in indexes */
Oid rd_oidindex; /* OID of unique index on OID, if any */
...
Form_pg_index rd_index; /* pg_index tuple describing this index */
...
} RelationData;

Since the RelationData data structure is unchanged, a hash table is used to maintain this structure. This hash table is also the most widely used hash data structure within PG. Its performance and stability have been honed in PostgreSQL’s nearly thirty years of experience. The implementation of this hash table is also an industrial-grade data structure worth learning.

Dynamic hash table introduction reference: https://zhmin.github.io/posts/postgresql-dynamic-hash/

3.1 relcache initialization

Initialization phase 1: Call the RelationCacheInitialize function to initialize and create a hash table.

Initialization phase 2: Add the necessary system tables and system table index patterns to RelCache, including pg_class, pg_attribute, pg_proc, and pg_type.

3.2 Operation of relcache
  1. Insert into newly opened table

    When opening a new table, RelationData needs to be added to RelCache. This operation is implemented through the macro RelationCacheInsert.

  2. Find hash table

    Searching the hash table is implemented through the macro definition RelationIdCacheLookup and calling the function hash_search.

    relation_open
    RelationIdGetRelation
    RelationIdCacheLookup(relationId, rd);
    RelationBuildDesc -- no reldesc in the cache, RelationBuildDesc() build one and add it.
    RelationBuildTupleDesc
    ....
    RelationCacheInsert(relation);
    
  3. Delete from hash table
    Deleting elements from the hash table is achieved through the macro definition RelationCacheDelete.

    RelationClearRelation
    RelationCacheDelete