dynamic-datasource+shardingsphere-jdbc realizes sub-database and sub-table

Project background

On the basis of spring boot and dynamic-datasource realizing the separation of reading and writing of the database, the function of sub-database and table is added.

1. Framework version

1. Key technology stack versions

 <properties>
        <java.version>8</java.version>
        <snakeyaml.version>1.33</snakeyaml.version>
    </properties>
    <dependencies>
        <!-- druid connection pool -->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>druid-spring-boot-starter</artifactId>
            <version>1.2.20</version>
        </dependency>
        <!--dynamic-datasource-->
        <dependency>
            <groupId>com.baomidou</groupId>
            <artifactId>dynamic-datasource-spring-boot-starter</artifactId>
            <version>3.6.1</version>
        </dependency>
        <!-- shardingsphere-jdbc -->
        <dependency>
            <groupId>org.apache.shardingsphere</groupId>
            <artifactId>shardingsphere-jdbc-core</artifactId>
            <version>5.4.1</version>
        </dependency>
    </dependencies>

2. Supplementary instructions

1. snakeyamlversion

  • Requires version 1.33 or above
  • spring boot can be configured through properties 1.33

2. shardingsphere-jdbc-core-spring-boot-starter and shardingsphere-jdbc-core selection

  • After the 5.2.1 version, the configuration method has been modified. spring boot version dependencies are no longer provided separately, and shardingsphere-jdbc-core is used uniformly.
  • shardingsphere-jdbc-core-spring-boot-starter: The configuration file is written in application.yml
  • shardingsphere-jdbc-core: Use a special driver class, and the configuration file is written in a separate yaml file

3. Latest version Maven repository link

  • shardingsphere-jdbc-core Maven Central
  • dynamic-datasource-spring-boot-starter Maven Central
  • druid-spring-boot-starter Maven Central

2. Configuration file

1. spring boot configuration file: application.yml

spring:
  datasource:
    dynamic:
      primary: master
      strict: false
      datasource:
        master:
          url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8
          username: root
          password: root
          driver-class-name: com.mysql.cj.jdbc.Driver
        slave_1:
          url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8
          username: root
          password: root
          driver-class-name: com.mysql.cj.jdbc.Driver
        sharding:
          url: jdbc:shardingsphere:classpath:sharding-sphere-config.yaml
          driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver

2. shardingsphere-jdbcConfiguration file

mode:
  type: Standalone
  repository:
    type: JDBC

dataSources:
  write:
    dataSourceClassName: com.alibaba.druid.pool.DruidDataSource
    driverClassName: com.mysql.jdbc.Driver
    url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8
    username: root
    password: root
  read:
    dataSourceClassName: com.alibaba.druid.pool.DruidDataSource
    driverClassName: com.mysql.jdbc.Driver
    url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8
    username: root
    password: root

rules:
  - !SHARDING
    autoTables:
      sharding_test:
        actualDataSources: write
        keyGenerateStrategy:
          column: id
          keyGeneratorName: snowflake
        shardingStrategy:
          standard:
            shardingColumn: id
            shardingAlgorithmName: auto_mod

    shardingAlgorithms:
      auto_mod:
        type: MOD
        props:
          sharding-count: 3

    keyGenerators:
      snowflake:
        type: SNOWFLAKE

  - !READWRITE_SPLITTING
    dataSources:
      write:
        writeDataSourceName: write
        readDataSourceNames:
          -read
        loadBalancerName: round_robin
    loadBalancers:
      round_robin:
        type: ROUND_ROBIN

props:
  sql-show: true

4. Configuration details

1. Data source configuration

(1) Configure DataSource Driver

Method 1: Use spring bootdefault database configuration
spring:
  datasource:
    #Configure DataSource Driver
    driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver
    #Specify YAML configuration file
    url: jdbc:shardingsphere:classpath:xxx.yaml
Method 2: Use dynamic-datasource to configure the database
spring:
  datasource:
    dynamic:
      primary:master
      strict: false
      datasource:
        master:
          url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8
          username: root
          password: root
          driver-class-name: com.mysql.jdbc.Driver
        slave_1:
          url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8
          username: root
          password: root
          driver-class-name: com.mysql.jdbc.Driver
        sharding:
          url: jdbc:shardingsphere:classpath:sharding-sphere-config.yaml
          driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver

(2) Configure Yaml

Background information

ShardingSphere-JDBC supports all database JDBC drivers and connection pools.

The example database driver is MySQL and the connection pool is HikariCP. It can be replaced with other database drivers and connection pools. When using ShardingSphere-JDBC, the property name of the JDBC pool depends on the definition of the respective JDBC pool and is not defined by ShardingSphere. For related processing, please refer to the class org.apache.shardingsphere.infra.datasource.pool.creator.DataSourcePoolCreator . For example, for Alibaba Druid 1.2.9, using url instead of jdbcUrl in the example below is expected behavior.

Parameter explanation
dataSources: # Data source configuration, multiple <data-source-name> can be configured
  <data_source_name>: # Data source name
    dataSourceClassName: # Complete class name of data source
    driverClassName: # Database driver class name, subject to the configuration of the database connection pool itself
    jdbcUrl: # Database URL connection, subject to the configuration of the database connection pool itself
    username: # Database username, subject to the configuration of the database connection pool itself
    password: # Database password, subject to the configuration of the database connection pool itself
    #...Other properties of the database connection pool
Configuration example
dataSources:
  write:
    dataSourceClassName: com.alibaba.druid.pool.DruidDataSource
    driverClassName: com.mysql.jdbc.Driver
    url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8
    username: root
    password: root
  read:
    dataSourceClassName: com.alibaba.druid.pool.DruidDataSource
    driverClassName: com.mysql.jdbc.Driver
    url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8
    username: root
    password: root

2. Rule configuration

(1) Basic concepts

Logic table

The logical name of a horizontally split database (table) with the same structure, which is the logical identifier of the table in SQL. Example: The order data is split into 10 tables based on the primary key’s mantissa, namely t_order_0 to t_order_9, and their logical table names are t_order.

True table

A physical table that actually exists in a horizontally split database. That is, t_order_0 to t_order_9 in the previous example.

Binding table

Refers to a group of shard tables with consistent sharding rules. When using binding tables to perform multi-table correlation queries, you must use shard keys for correlation, otherwise Cartesian product correlation or cross-database correlation will occur, thus affecting query efficiency. For example: the t_order table and the t_order_item table are both fragmented according to order_id and are related using order_id, then this The two tables are bound to each other. Multi-table correlation queries between bound tables will not have Cartesian product correlations, and the efficiency of correlation queries will be greatly improved. For example, if the SQL is:

SELECT i.* FROM t_order o JOIN t_order_item i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

When the binding table relationship is not configured, assuming that the shard key order_id routes the value 10 to the 0th slice and the value 11 to the 1st slice, then the routed SQL should be 4 items, which are presented as Cartesian products:

SELECT i.* FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

SELECT i.* FROM t_order_0 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

SELECT i.* FROM t_order_1 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

SELECT i.* FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

After configuring the binding table relationship and using order_id for association, the routing SQL should be 2:

SELECT i.* FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

SELECT i.* FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

Among them, the t_order table has specified sharding conditions, and ShardingSphere will use it as the main table of the entire binding table. All routing calculations will only use the strategy of the main table, then the sharding calculations of the t_order_item table will use the conditions of t_order.

Note: Multiple sharding rules in the binding table need to be configured according to the logical table prefix combined with the sharding suffix, for example:

rules:
- !SHARDING
  tables:
    t_order:
      actualDataNodes: ds_${<!-- -->0..1}.t_order_${<!-- -->0..1}
    t_order_item:
      actualDataNodes: ds_${<!-- -->0..1}.t_order_item_${<!-- -->0..1}
Broadcast table

Refers to tables that exist in all data sources. The table structure and its data are completely consistent in each database. It is suitable for scenarios where the amount of data is not large and requires associated queries with tables with massive data, such as dictionary tables.

Single table

Refers to the only table that exists in all sharded data sources. Suitable for tables with small data volume and no need for sharding.

Note: Single tables that meet the following conditions will be automatically loaded:

  • Single table showing configuration in rules such as data encryption and data desensitization
  • A single table created by users executing DDL statements through ShardingSphere

ShardingSphere will not automatically load the remaining single tables that do not meet the above conditions. Users can configure single table rules for management as needed.

Sharding key

Database field used to split the database (table) horizontally. Example: If the mantissa of the order primary key in the order table is modulo sharded, the order primary key will be the sharding field. If there is no fragmentation field in SQL, full routing will be performed and the performance will be poor. In addition to supporting single sharding fields, Apache ShardingSphere also supports sharding based on multiple fields.

Fragmentation algorithm

Algorithm for sharding data, supports =, >=, <=, >, <, BETWEEN and IN for sharding. The sharding algorithm can be implemented by developers themselves, or they can use Apache ShardingSphere's built-in sharding algorithm syntax sugar, which is very flexible.

Automated sharding algorithm

Sharding algorithm syntactic sugar is used to conveniently host all data nodes, and users do not need to pay attention to the physical distribution of real tables. Including the implementation of common sharding algorithms such as modulo, hash, range, and time.

Customized sharding algorithm

Provides an interface for application developers to implement sharding algorithms closely related to business implementation, and allows users to manage the physical distribution of real tables by themselves. Custom sharding algorithms are divided into:

  • Standard Sharding Algorithm

For handling =, IN, BETWEEN AND, >, that use a single key as the shard key ><, >=, <= scenarios for sharding.

  • Composite Sharding Algorithm

It is used to handle sharding scenarios where multiple keys are used as sharding keys. The logic of multiple sharding keys is complex, and application developers need to handle the complexity themselves.

  • Hint sharding algorithm

Used to handle scenarios where Hint line slicing is used.

Fragmentation strategy

Contains the sharding key and sharding algorithm. Due to the independence of the sharding algorithm, it is extracted independently. What can really be used for sharding operations is the sharding key + sharding algorithm, which is the sharding strategy.

Forced fragmentation routing

For scenarios where the sharding field is not determined by SQL but other external conditions, you can use SQL Hint to inject the sharding value. Example: Log into the database according to the primary key of the employee, but there is no such field in the database. SQL Hint supports use through Java API and SQL annotations. For details, see Forced Shard Routing.

Row expression

Row expressions are used to solve the two main problems of configuration simplification and integration. In the cumbersome configuration of data sharding rules, as the number of data nodes increases, a large number of repeated configurations make the configuration itself difficult to maintain. Row expressions can effectively simplify the data node configuration workload.

For common sharding algorithms, using Java code to implement them does not facilitate unified management of configurations. Writing sharding algorithms through row expressions can effectively store rule configurations together, making it easier to browse and store.

The line expression as a string consists of two parts, namely the Type Name part corresponding to the SPI implementation at the beginning of the string and the expression part. Taking t_order_${1..3} as an example, the characters
The substring GROOVY of the string part is the Type Name of the corresponding SPI implementation used by this line expression, which is symbolized by <> package to identify. And the string t_order_${1..3}
The expression part of this row's expression. When the line expression does not specify a Type Name, such as t_order_${1..3}, the line expression will use the InlineExpressionParser SPI by default
GROOVY is implemented to parse expressions.

The following sections describe the syntax rules implemented by GROOVY.

The use of row expressions is very intuitive. You only need to use ${ expression } or $->{ expression } to identify the row expression in the configuration. Currently, the configuration of two parts, data node and sharding algorithm, is supported. The content of row expressions uses Groovy's syntax, and row expressions can support all operations that Groovy can support. For example:

${begin..end} represents the range
${[unit1, unit2, unit_x]} represents the enumeration value

If there are multiple ${ expression } or $->{ expression } expressions in a row, the final result of the entire expression will be based on each sub-expression. Cartesian combination of the results.

For example, the following line expression:

${['online', 'offline']}_table${1..3}

It will eventually be parsed into:

online_table1, online_table2, online_table3, offline_table1, offline_table2, offline_table3
Distributed primary key

In traditional database software development, automatic primary key generation technology is a basic requirement. Each database also provides corresponding support for this requirement, such as MySQL's auto-increment key, Oracle's auto-increment sequence, etc. After data is sharded, generating globally unique primary keys from different data nodes is a very difficult problem. The auto-increment keys between different actual tables in the same logical table cannot sense each other, resulting in duplicate primary keys. Although collisions can be avoided by constraining the initial value and step size of the auto-increasing primary key, additional operation and maintenance rules need to be introduced, making the solution lack completeness and scalability.

There are currently many third-party solutions that can perfectly solve this problem, such as UUID, etc. that rely on specific algorithms to self-generate unique keys, or by introducing primary key generation services. In order to facilitate users and meet the needs of different users in different usage scenarios, Apache ShardingSphere not only provides built-in distributed primary key generators, such as UUID and SNOWFLAKE, but also extracts the interface of the distributed primary key generator to facilitate users to implement their own customization The auto-incrementing primary key generator.

(2) Configure yaml

Background information

The data sharding YAML configuration method has extraordinary readability. Through the YAML format, you can quickly understand the dependencies between sharding rules. ShardingSphere will automatically complete the creation of ShardingSphereDataSource objects based on the YAML configuration, reducing unnecessary coding work for users. .

Parameter explanation
rules:
- !SHARDING
  tables: # Data sharding rule configuration
    <logic_table_name> ( + ): # Logical table name
      actualDataNodes (?): # Composed of data source name + table name (refer to Inline syntax rules)
      databaseStrategy (?): # Splitting strategy, the default means using the default sharding strategy, only one of the following sharding strategies can be selected
        standard: # Standard sharding scenario for single sharding key
          shardingColumn: # Sharding column name
          shardingAlgorithmName: # Sharding algorithm name
        complex: # Used for composite sharding scenarios with multiple shard keys
          shardingColumns: # Sharding column names, multiple columns separated by commas
          shardingAlgorithmName: # Sharding algorithm name
        hint: # Hint sharding strategy
          shardingAlgorithmName: # Sharding algorithm name
        none: # No fragmentation
      tableStrategy: # Table splitting strategy, same as database splitting strategy
      keyGenerateStrategy: # Distributed sequence strategy
        column: #Auto-increment column name, the default means not to use the auto-increment primary key generator
        keyGeneratorName: # Distributed sequence algorithm name
      auditStrategy: # Sharding audit strategy
        auditorNames: # Sharding audit algorithm names
          - <auditor_name>
          - <auditor_name>
        allowHintDisable: true # Whether to disable shard audit hints
  autoTables: # Automatic sharding table rule configuration
    t_order_auto: # Logical table name
      actualDataSources (?): # Data source name
      shardingStrategy: # Sharding strategy
        standard: # Standard sharding scenario for single sharding key
          shardingColumn: # Sharding column name
          shardingAlgorithmName: # Automatic sharding algorithm name
  bindingTables ( + ): # List of binding table rules
    - <logic_table_name_1, logic_table_name_2, ...>
    - <logic_table_name_1, logic_table_name_2, ...>
  defaultDatabaseStrategy: #Default database sharding strategy
  defaultTableStrategy: #Default table sharding strategy
  defaultKeyGenerateStrategy: #Default distributed sequence strategy
  defaultShardingColumn: #Default sharding column name
  
  # Sharding algorithm configuration
  shardingAlgorithms:
    <sharding_algorithm_name> ( + ): # Sharding algorithm name
      type: # Sharding algorithm type
      props: # Sharding algorithm property configuration
      #...
  
  # Distributed sequence algorithm configuration
  keyGenerators:
    <key_generate_algorithm_name> ( + ): # Distributed sequence algorithm name
      type: # Distributed sequence algorithm type
      props: # Distributed sequence algorithm property configuration
      #...
  # Sharding audit algorithm configuration
  auditors:
    <sharding_audit_algorithm_name> ( + ): # Sharding audit algorithm name
      type: # Sharding audit algorithm type
      props: # Sharding audit algorithm property configuration
      #...

- !BROADCAST
  tables: # Broadcast table rule list
    - <table_name>
    - <table_name>
Operation steps
  1. Configure data sharding rules in the YAML file, including data sources, sharding rules, global attributes and other configuration items;
  2. Call the createDataSource method of the YamlShardingSphereDataSourceFactory object to create a ShardingSphereDataSource based on the configuration information in the YAML file.
Configuration example

The data sharding YAML configuration example is as follows:

rules:
  - !SHARDING
    autoTables:
      sharding_test:
        actualDataSources: write
        keyGenerateStrategy:
          column: id
          keyGeneratorName: snowflake
        shardingStrategy:
          standard:
            shardingColumn: id
            shardingAlgorithmName: auto_mod

    shardingAlgorithms:
      auto_mod:
        type: MOD
        props:
          sharding-count: 3

    keyGenerators:
      snowflake:
        type: SNOWFLAKE

  - !READWRITE_SPLITTING
    dataSources:
      write:
        writeDataSourceName: write
        readDataSourceNames:
          - read
        loadBalancerName: round_robin
    loadBalancers:
      round_robin:
        type: ROUND_ROBIN

Through the createDataSource method of YamlShardingSphereDataSourceFactory, read the YAML configuration to complete the creation of the data source.

YamlShardingSphereDataSourceFactory.createDataSource(getFile("/META-INF/sharding-databases-tables.yaml"));

3. Other configuration

(1) Mode configuration

Parameter explanation
mode (?): # If not configured, the default stand-alone mode
  type: # Run mode type. Optional configuration: Standalone, Cluster
  repository (?): # Persistent warehouse configuration
Configuration example
mode:
  type: Standalone # Standalone mode
  repository:
    type: JDBC

(2) propsConfiguration

Background information

Apache ShardingSphere provides system-level configuration through property configuration.

Parameter explanation
Name Data type Description Default value
sql-show (?) boolean Whether to print SQL in the log
Printing SQL can help developers quickly locate system problems. Log content includes: logical SQL, real SQL and SQL parsing results.
If the configuration is enabled, the log will use Topic ShardingSphere-SQL, and the log level is INFO
false
sql-simple (? ) boolean Whether to print simple style SQL in the log false
kernel-executor -size (?) int is used to set the size of the task processing thread pool
Each ShardingSphereDataSource uses an independent thread pool, and different data sources of the same JVM do not share the thread pool
infinite
max-connections-size-per -query (?) int The maximum number of connections that can be used in each database instance for a query request 1
check-table-metadata-enabled (?) boolean Whether to check the structural consistency of shard metadata when the program starts and updates false
Operation steps

Property configuration is directly configured in the configuration file used by ShardingSphere-JDBC, in the following format:

props:
    sql-show: true
Configuration example
props:
  # print sql
  sql-show: true