KaiwuDB kernel parsing – SQL query life cycle

1. Overview

The KaiwuDB kernel analysis series is divided into two parts. This article is the first part of the series. It mainly covers network protocols to SQL executors, explaining how KaiwuDB executes SQL queries, including the execution paths of various components of the system (network protocols, SQL session management, Parser, execution plan and optimization, executor, KV engine, transaction management, etc.), the purpose is to provide a high-level unified view of the structure of various components and the relationships between them.

The following figure is an overview of KaiwuDB SQL query execution. On the left is the gateway node, which is responsible for receiving SQL queries from SQL Client, generating query plans (logical plans and physical plans), constructing the FlowSpec required by the distributed execution engine, and sending it to the node where the queried data is located. Each node will construct and execute the operators in the physical plan based on the received FlowSpec, and then return the data to the gateway node through the network. Next we will discuss in detail how each component works.

2. PostgreSQL Wire Protocol

SQL queries are sent to KaiwuDB via the Postgres Wire protocol (the Postgres protocol is used for compatibility with existing client drivers and applications). This component implements functional interfaces related to the Postgres wire protocol. After the user connects, they will first be authenticated. After the authentication is passed, a loop will be initialized to continuously read SQL statements, execute them and return the results (by encapsulating golang’s net.Conn).

This protocol is message-oriented (PostgreSQL message type, see executor section): During the life cycle of the current connection, one or more messages containing SQL statements will be read and passed to the SQL executor for execution; once the SQL After the statement is executed and the results are generated, they are serialized and returned to the client.

PostgreSQL Wire Protocol Server is initialized when KaiwuDB starts. The detailed initialization process is shown in the figure below. First, KaiwuDB’s start command will call startServeSQL through the server.Start method to initialize ServConn. ServConn is responsible for parsing the SQL client’s request, checking the security of the connection and processing the connection parameters, and then calling the pgServer.ServConn method to process the SQL Statement.

3. SQL Executor

KaiwuDB uses one port to handle the three protocols pg/http/grpc at the same time. In the start phase, KaiwuDB instantiates pgServer to handle requests for the Postgres wire protocol. pgServer will instantiate KaiwuDB’s SQL executor to process user queries.

The SQL executor will act as a producer (Producer), continue to read user input and call parser to parse SQL into statement, and the parsed result will be saved in the Statement buffer (StmtBuf). At the same time, a go routine will be created as a consumer to process the SQL statements in StmtBuf in sequence.

The specific processing flow of the SQL executor is shown in the figure below. First, it will create a new goroutine by calling processCommandsAsync through serverImpl to authenticate the client connection and process the commands in the Statement buffer (StmtBuf) (the consumer mentioned above).

Its return value is a channel used to indicate whether the goroutine has ended. Note that any error messages in this channel will be ignored, as the details of any errors that occurred during this period have already been notified to the SQL client. processCommandsAsync will also perform authentication work. If authentication fails, this goroutine will end and cancelConn will be called to close the entire connection.

Next, serverImpl will initialize a for loop to receive input from the SQL client until the connection is closed or an error occurs. The for loop here acts as a producer and first verifies the permissions of the SQL Client. After passing it, it will call different methods according to the type of message sent by the client.

For message type definitions, please refer to>>https://www.postgresql.org/docs/9.4/protocol-message-formats.html

The following chapters will take the handleSimpleQuery method as an example to illustrate the process of SQL executor processing SQL.

1. SQL reception and parsing

The purpose of handleSimpleQuery is to handle simple SQL. First, it reads a string from the PostgreSQL Wire Protocol’s cache. If the read is successful, the string will be sent to KaiwuDB’s SQL parser.

KaiwuDB’s parser was originally copied from PostgresSQL and gradually enhanced to support more SQL syntaxes. The output of the SQL parser is an array of ASTs (Abstract Syntax Trees), one for each SQL statement. The nodes of AST are composed of tree.Statement structures defined in pkg/sql/sem/tree:

Go
type Statement interface {
  fmt.Stringer
  NodeFormatter
  StatementType() StatementType
  // StatementTag is a short string identifying the type of statement
  // (usually a single verb). This is different than the Stringer output,
  // which is the actual statement (including args).
  // TODO(dt): Currently tags are always pg-compatible in the future it
  // might make sense to pass a tag format specifier.
  StatementTag() string
}

KaiwuDB implements the use of subclasses of tree.Statement to abstract each clause of the SQL statement. For example, the tree.SelectClause structure abstracts the SELECT clause in SQL, including the From and Where clauses of SELECT. At the same time, many parts of the AST tree will contain one or more tree.Expr structures, used to represent arithmetic expressions such as l_extendedprice * (1 – l_discount).

Go
type SelectClause struct {
  Distinct bool
  DistinctOn DistinctOn
  Exprs SelectExprs
  From From From
  Where *Where
  GroupBy GroupBy
  Having *Where
  Window Window
  TableSelect bool
}

type BinaryExpr struct {
  Operator BinaryOperator
  Left, Right Expr

  typeAnnotation
  fn *BinOp
}

After the SQL parsing is successful, it will be added to the Statement bufer and wait for processing by the executor.

Let’s take Q7 in TPCH as an example to show the structure of AST in KaiwuDB.

Q7 of TPCH is used to query the total value of goods transported between two specific countries (here France and Germany) within a certain period of time (1995-01-01 to 1996-12-31).

SQL
SELECT
    supp_nation,
    cust_nation,
    l_year, sum(volume) AS revenue
FROM (
    SELECT
        n1.n_name AS supp_nation,
        n2.n_name AS cust_nation,
        extract(year FROM l_shipdate) AS l_year,
        l_extendedprice * (1 - l_discount) AS volume
    FROM
        supplier,
        line item,
        orders,
        customer,
        nation n1,
        nation n2
    WHERE
        s_suppkey = l_suppkey
        AND o_orderkey = l_orderkey
        AND c_custkey = o_custkey
        AND s_nationkey = n1.n_nationkey
        AND c_nationkey = n2.n_nationkey
        AND (
            (n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY')
            or (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE')
        )
        AND l_shipdate BETWEEN DATE '1995-01-01' AND DATE '1996-12-31'
    ) AS shipping
GROUP BY
    supp_nation,
    cust_nation,
    l_year
ORDER BY
    supp_nation,
    cust_nation,
    l_year;

The figure below shows the AST of TPCH Q7 in KaiwuDB (simplified for the convenience of presentation). The white part in each node represents a null value (nil), indicating that the current node does not contain the structure; the purple part represents that the structure can be expanded, that is, pointing to the next child node; the green part represents a leaf node.

Because Q7 is a SELECT statement, its root node is tree.Select, a subclass of tree.Statement. Each Statement in Q7 is abstracted into a specific node. For example, ORDER BY is represented as tree.OrderBy, WHERE clause is represented as tree.Where, etc.

2. SQL statement execution

As mentioned above, the consumer goroutine of the SQL executor will continuously read the AST of SQL from the Statement buffer and execute it. This function is implemented by calling execCmd of connExecutor through sqlServer.ServeConn. The main process is shown in the figure below.

The execCmd method will continuously read the contents of stmtBuf and execute it. When each SQL client connection is initialized, KaiwuDB will initialize a finite state machine (FSM) used to execute SQL. It has 5 states:

  • stateNoTxn – used to handle BEGIN statements or implicit transactions (start a new transaction for it)

  • stateOpen – used to execute ordinary SQL statements

  • stateAborted – used to handle ROLLBACK statements

  • stateCommitWait – used to handle COMMIT statements

  • stateInternalError – used to handle various errors, such as eventNonRetriableErr, stateInternalError, etc.

After the SQL instructions of stmtBuf are added to the state machine, they will be processed separately according to their types. Ordinary SQL is defined as the ExecStmt type and will be run by calling the execStmt method of connExecutor, as shown in the figure above.

The execStmt method will first determine the status of the current state machine. If it is a BEGIN statement, it will execute the execStmtInNoTxnState method of connExecutor to create a new transaction; if it is ordinary SQL, it will call the execStmtInOpenState of connExecutor to run.

The figure below shows the process of creating new things by execStmtInNoTxnState. If the current statement is BEGIN, a new thing is started. If it is any one of tree.CommitTransaction, tree.ReleaseSavepoint, tree.RollbackTransaction, tree.SetTransaction or tree.Savepoint, an error will be reported; in other cases (that is, ordinary SQL statements), the state of the state machine will be converted to stateOpen and Create an implicit transaction.

After the transaction is successfully created and the state of the state machine is converted to stateOpen, SQL will enter the execution phase. This part will be introduced in detail in subsequent articles.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. MySQL entry skill treeSQL advanced skillsCTE and recursive query 77482 people are learning the system