How to use local temporary tables of GaussDB (DWS) for data processing?

This article is shared from the Huawei Cloud Community “GaussDB (DWS) Temporary Table Series – Local Temporary Table”, author: acydy.

GaussDB (DWS) supports three forms of temporary tables from version 8.2.1: local temporary tables, volatile temporary tables, and global temporary tables. This article first introduces the local temporary table function of DWS.

Features of local temporary tables: table definitions and data are session-related, and other sessions cannot see the local temporary tables created by this session. The metadata will be persisted to the system table, and RETRY can be supported if the cluster node fails abnormally.

1. Grammar and usage

CREATE [LOCAL] { TEMPORARY | TEMP } TABLE [ IF NOT EXISTS ] table_name

({ column_name data_type [ compress_mode ] [ COLLATE collation ] [ column_constraint [ ... ] ]

| table_constraint

| LIKE source_table [ like_option [...] ] }

[, ... ])

[ WITH ( {storage_parameter = value} [, ... ] ) ]

[ ON COMMIT { PRESERVE ROWS | DELETE ROWS } ]

When creating a table, you need to specify TEMP or TEMPORARY, which means creating a local temporary table.

  • ON COMMIT { PRESERVE ROWS | DELETE ROWS }

The ON COMMIT option determines whether to create a temporary table in a transaction. When the transaction is committed, the subsequent operations of this temporary table will be executed.

  • PRESERVE ROWS (default value): No operations are performed on the temporary table when submitting, and the temporary table and its table data remain unchanged. This type is recommended.
  • DELETE ROWS: Delete the data in the temporary table when submitting.

Other parts are the same as ordinary tables.

gaussdb=# create temp table tmp1(a int,b int);

NOTICE: The 'DISTRIBUTE BY' clause is not specified. Using round-robin as the distribution mode by default.

HINT: Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.

CREATE TABLE

Temporary tables can have the same name as non-temporary tables. If they have the same name, the priority of temporary tables is higher than that of non-temporary tables.

gaussdb=# create temp table tmp1(a int,b int);

NOTICE: The 'DISTRIBUTE BY' clause is not specified. Using round-robin as the distribution mode by default.

HINT: Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.

CREATE TABLE

gaussdb=# insert into tmp1 values(1,1);

INSERT 0 1

gaussdb=# create table tmp1(a int,b int);

NOTICE: The 'DISTRIBUTE BY' clause is not specified. Using round-robin as the distribution mode by default.

HINT: Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.

CREATE TABLE

gaussdb=# select *from tmp1;

a | b

--- + ---

1 | 1

(1 row)

gaussdb=# select *from public.tmp1;

a | b

--- + ---

(0 rows)

Views: Views created based on temporary tables are temporary views.

postgres=# create view tmp_v1 as select *from tmp1;

NOTICE: view "tmp_v1" will be a temporary view

CREATE VIEW

2. Usage scenarios

1. Complex business logic is split using local temporary tables

If the business SQL statement is too complex, you can use a local temporary table to cache the intermediate results of execution, thereby splitting the complex business logic into multiple simpler statements. The statistical information of simple statements is more prepared, and the split business is easier to maintain.

2. Support exceptions in CN nodes.

GaussDB (DWS) is a distributed architecture database. There are multiple Coordinators (CN), and the relationships are equal. The client can connect to any CN. The CN stores the metadata information of the table. When executing DDL, metadata will be synchronized on all DNs to ensure data consistency. If an abnormality occurs in a CN, operations such as creating a table and deleting a table will fail, causing the entire job to fail.

In this scenario, local temporary tables can be used. Local temporary tables are only visible in the current session. When performing operations such as creation, ALTER, and deletion of local temporary tables, only the metadata will be modified in the current CN. This will not be affected by abnormalities in other CN nodes and ensure business continuity.

3. Principle

The metadata difference between temporary tables and ordinary tables is that temporary tables are not visible in other sessions, so they are built in a schema that only belongs to the current session. When this session creates a temporary table for the first time, the session’s schema will be created at the same time. The temporary schema is different for each session.

Session 1:

gaussdb=# create temp table tmp1(a int,b int);

NOTICE: The 'DISTRIBUTE BY' clause is not specified. Using round-robin as the distribution mode by default.

HINT: Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.

CREATE TABLE

gaussdb=# \d + tmp1

Table "pg_temp_coordinator1_65_3_140257888512760.tmp1"

Column | Type | Modifiers | Storage | Stats target | Description

-------- + --------- + ----------- + --------- + --------- ----- + -------------

a | integer | | plain | |

b | integer | | plain | |

Has OIDs: no

Distribute By: ROUND ROBIN

Location Nodes: ALL DATANODES

Options: orientation=row, compression=no:

Session 2, the tmp1 table cannot be queried.

gaussdb=# select * from tmp1;

ERROR: relation "tmp1" does not exist

LINE 1: select * from tmp1;

^

Naming rules for temporary schema: pg_temp_Coordinator name_timelineID_global auto-increment ID_threadID

Coordinator name: CN name, which isolates schemas created by different CNs.

timelineID: will increase after the node is restarted and is used to determine whether this schema is invalid.

Global auto-increment ID: auto-increment ID on a single CN. The same CN does not need to have different session auto-increment IDs.

Metadata: The relpersistence flag of the local temporary table is ‘t’.

gaussdb=# select relname, relpersistence from pg_class where relname = 'tmp1';

relname | relpersistence

--------- + ----------------

tmp1 | t

Data cleaning:

  1. Session exited normally

    When the session exits normally, the table definition and data of the local temporary table will be deleted. The original data is no longer accessible.

  2. When the session exits abnormally or the current CN or a certain DN node is abnormal.

    When an exception occurs, the node’s metadata and data will not be deleted immediately. GaussDB (DWS) relies on the component gs_clean tool to automatically and regularly clean local temporary tables. Ensure that the data is cleaned after a period of time to prevent continued expansion of space.

CN Retry

When the CN Retry function is turned on, the temporary table data will be logged. To ensure data consistency, it is not recommended to switch the CN Retry switch state when using the temporary table. Keep the CN Retry switch always on or off in the session using the temporary table.

When CN Retry is turned on, the DN node restarts abnormally, and the data in the temporary table is guaranteed not to be lost. After the DN is restarted, previous sessions can still be accessed.

If you want the temporary table not to be logged:

set max_query_retry_times = 0;

4. Use constraints

  1. If the upper-layer application uses the connection pool mechanism to connect to GaussDB (DWS), when using a temporary table, it is strongly recommended to actively delete the temporary table before returning the connection to the connection pool to avoid data anomalies caused by the connection not being disconnected. Or use the DISCARD TEMP command to clear the temporary table information of the session.
  2. Local temporary tables are ignored during expansion.
  3. gs_dump local temporary tables are not supported.

Click to follow and learn about Huawei Cloud’s new technologies as soon as possible~

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Java skill treeUsing JDBC to operate the databaseDatabase operation 137760 people are learning the system