[原文] PostgreSQL auto-generated UUID, sequence or identity column for primary key?

Sometimes clients ask me about the best option to automatically generate primary keys. In this article, I’ll explore these options and make recommendations.

Why should the primary key be automatically generated?

Every table needs a primary key. In a relational database, it is important to be able to identify individual table rows. If you want to know why, search the Internet for thousands of questions asking for help in removing duplicate entries from a table.

It is recommended that you choose a primary key that is not only unique, but never changes during the lifetime of the table row. This is because foreign key constraints often refer to primary keys, and changing a primary key referenced elsewhere would cause troublesome or unnecessary work.

Now, sometimes a table has a natural primary key, such as social security numbers for citizens of a country. But usually, there is no such attribute and you have to generate an artificial primary key. Some people even argue that artificial primary keys should be used even if there are natural primary keys, but I won’t engage in that holy war.

PostgreSQL automatically generates primary key skills

There are two basic techniques:

Generate key with sequence

A sequence is a database object whose sole purpose is to generate a unique number in life. It does this using an internal counter it increments.

Sequences are highly optimized for concurrent access, they will never emit the same number twice. Nonetheless, accessing a sequence from many concurrent SQL statements can become a bottleneck, so there is a CACHE option that allows sequences to distribute multiple values to database sessions at once.

Sequences do not follow the normal transaction rules: if the transaction is rolled back, the sequence does not reset its counters. This is required for good performance and should not pose a problem. If you’re looking for a way to generate a sequence of numbers without gaps, sequences aren’t the right choice and you’ll have to resort to less efficient and more complex techniques.

To get the next value from a sequence, you can use the nextval function like this:

SELECT nextval(‘sequence_name’);
See the documentation for other functions that operate on sequences.

Generate UUID s

A UUID (Universally Unique Identifier) is a 128-bit number generated using an algorithm that effectively guarantees uniqueness. There are several normalization algorithms. In PostgreSQL, there are many functions for generating UUIDs:

  • The uuid-ossp extension provides functionality to generate UUIDs. Note that due to the hyphen in the name, you must quote the extension ( CREATE EXTENSION “uuid-ossp”.
  • Starting with PostgreSQL v13, you can generate version 4 (random) UUIDs using the core function gen_random_uuid().

Note that you should always use the PostgreSQL datatype uuid for UUID. Don’t try to convert them to strings, or numeric will waste space and reduce performance.

Define the auto-generated primary key

There are four ways to define columns with automatically generated values:

Use DEFAULT terms

You can use this method with sequences and UUIDs. Here are some examples:

CREATE TABLE has_integer_pkey ( id bigint DEFAULT nextval('integer_id_seq') PRIMARY KEY, ... );

CREATE TABLE has_uuid_pkey ( id uuid DEFAULT gen_random_uuid() PRIMARY KEY, ... );

log in to copy

DEFAULT PostgreSQL will use this value whenever an INSERT statement does not explicitly insert the column.

Using the serial and bigserial pseudotypes

This method is a shortcut for defining a sequence and setting the DEFAULT clause, as described above. Using this approach, you can define a table as follows:

CREATE TABLE uses_serial ( id bigserial PRIMARY KEY, ... );

This is equivalent to the following:`

CREATE TABLE uses_serial (
id bigint PRIMARY KEY,
…
);

CREATE SEQUENCE uses\_serial\_id_seq
OWNED BY uses_serial.id;

ALTER TABLE uses_serial ALTER id
SET DEFAULT nextval(uses\_serial\_id\_seq’); CREATE TABLE uses\_serial (
id bigint PRIMARY KEY,
…
);

CREATE SEQUENCE uses\_serial\_id_seq
OWNED BY uses_serial.id;

ALTER TABLE uses_serial ALTER id
SET DEFAULT nextval('uses\_serial\_id_seq');

The “OWNED BY” clause adds a dependency between the column and the sequence, so dropping a column automatically drops the sequence.

Using serial will create an integer column, while bigserial will create a bigint column.

Using identity columns

This is another way of using sequences, as PostgreSQL uses sequences “behind the scenes” to implement identity columns.

CREATE TABLE uses_identity (
   id bigint GENERATED ALWAYS AS IDENTITY
             PRIMARY KEY,
   ...
);

There’s also “GENERATED BY DEFAULT AS IDENTITY”, which is the same, except that you won’t get an error if you try to explicitly insert a value for a column (much like the DEFAULT clause). Read more below!

You can specify sequence options for identity columns:

CREATE TABLE uses_identity ( id bigint GENERATED ALWAYS AS IDENTITY (MINVALUE 0 START WITH 0 CACHE 20) PRIMARY KEY, ... );

log in to copy

Use BEFORE INSERT trigger

This is similar to the DEFAULT value, but it allows you to unconditionally override the user-inserted value with a generated value. The biggest disadvantage of triggers is the performance impact.

Should I use integer(serial) or bigint(bigserial) as the automatically generated primary key?

You should always use bigint.

Indeed, an integer takes four bytes, while a bigint takes eight. but:

  • If you have a small table, where integer is enough, four wasted bytes won’t matter much. Also, not every table you design will be small!
  • If you have a large table, you may exceed the maximum integer value of 2147483647. Note that this can also happen if your table contains fewer than this number of rows: you may delete rows, and some sequence values may be “lost” by the transaction being rolled back. Now, it’s quite complicated to change a primary key column from integer to bigint in a large table in an active database without causing excessive downtime, so you should save yourself the pain.

With a bigint, you’re sure to never exceed the maximum of 9223372036854775807: even inserting 10000 rows per second without any pauses, you’d have almost 30 million years to hit the limit.

Should I use bigserial or identity columns for my autogenerated primary keys?

Identity columns should be used unless older PostgreSQL versions must be supported.

Identity columns were introduced in PostgreSQL v11 and they have two advantages over bigserial:

They conform to the SQL standard, while bigserial is a proprietary PostgreSQL syntax. This will make your code more portable.
If you use GENERATED ALWAYS AS IDENTITY, you will get an error message if you try to override the generated value by explicitly interpolating a number. This avoids the common problem of manually entered values colliding with values generated later, leading to surprising application errors.
Therefore, unless it is necessary to support PostgreSQL v10 or lower, there is no reason to use bigserial.

Should I use bigint or uuid auto-generated primary key?

My recommendation is to use sequences unless you are using database sharding or have some other reason to generate primary keys in a “decentralized” way (outside of a single database).

The real difference

The advantages of bigint are obvious:

bigint uses only 8 bytes, while uuid uses 16 bytes
Getting a value from a sequence is cheaper than computing a UUID
A disadvantage of using a sequence is that it is a single object in a single database. So if you use sharding (distribute data across multiple databases), you cannot use sequences. In this case, UUID is an obvious choice. (You can use sequences defined with INCREMENT greater than 1 and a different START value, but this can cause problems when you add additional shards.)

Of course, you’ll also prefer UUIDs if your primary keys are not auto-generated by the database, but created in an application distributed across multiple application servers.

Virtual difference

Some argue that UUIDs are better because they distribute writes over different pages of the primary key index. This should reduce contention and result in a more balanced or less fragmented index. The first is correct, but this can actually be a disadvantage since it requires the entire index to be cached for good performance. The second is definitely wrong, because B-tree indexes are always balanced. Also, a change in PostgreSQL v11 ensures that monotonically increasing values will populate the index more efficiently than random insertions (but subsequent deletions will of course lead to fragmentation). In short, any such advantages are either negligible or non-existent, and they are balanced by the fact that uuids use twice the storage space, which would make the index larger, causing more writes and taking up More caching.

Baseline bigint and uuid

My colleague Kaarel did a small performance test a while ago and found that this uuid is slower than bigint larger joins.

I decided to run a small insert-only benchmark against these two tables:

`CREATE UNLOGGED TABLE test_bigint (
id bigint GENERATED ALWAYS AS IDENTITY (CACHE 200) PRIMARY KEY
);

CREATE UNLOGGED TABLE test_uuid ( id uuid DEFAULT gen_random_uuid() PRIMARY KEY );

I performed the benchmark on my laptop (SSD, 8 cores) using a pgbench custom script with 6 concurrent clients repeatedly running 1000 transactions preparing INSERT statements for five minutes:

INSERT INTO test_bigint /* or test_uuid */ DEFAULT VALUES;

Performance comparison between bigint and uuid automatically generated primary key

column1 bigint uuid
Inserts per second 107090 74947
Index growth per row 30.5 bytes 41.7 bytes

Using bigint clearly wins, but the difference is not huge.

Conclusion

Both sequence-generated numbers and UUIDs can be used as auto-generated primary keys.

Unless you need to generate primary keys outside of a single database, use identity columns and make sure all primary key columns are of type bigint.

Original address:
https://www.modb.pro/db/98437
https://www.cybertec-postgresql.com/en/uuid-serial-or-identity-columns-for-postgresql-auto-generated-primary-keys/