Project background
On the basis of spring boot
and dynamic-datasource
realizing the separation of reading and writing of the database, the function of sub-database and table is added.
1. Framework version
1. Key technology stack versions
<properties> <java.version>8</java.version> <snakeyaml.version>1.33</snakeyaml.version> </properties> <dependencies> <!-- druid connection pool --> <dependency> <groupId>com.alibaba</groupId> <artifactId>druid-spring-boot-starter</artifactId> <version>1.2.20</version> </dependency> <!--dynamic-datasource--> <dependency> <groupId>com.baomidou</groupId> <artifactId>dynamic-datasource-spring-boot-starter</artifactId> <version>3.6.1</version> </dependency> <!-- shardingsphere-jdbc --> <dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>shardingsphere-jdbc-core</artifactId> <version>5.4.1</version> </dependency> </dependencies>
2. Supplementary instructions
1. snakeyaml
version
- Requires version 1.33 or above
spring boot
can be configured throughproperties
1.33
2. shardingsphere-jdbc-core-spring-boot-starter
and shardingsphere-jdbc-core
selection
- After the
5.2.1
version, the configuration method has been modified.spring boot
version dependencies are no longer provided separately, andshardingsphere-jdbc-core
is used uniformly. shardingsphere-jdbc-core-spring-boot-starter
: The configuration file is written inapplication.yml
shardingsphere-jdbc-core
: Use a special driver class, and the configuration file is written in a separate yaml file
3. Latest version Maven repository link
shardingsphere-jdbc-core
Maven Centraldynamic-datasource-spring-boot-starter
Maven Centraldruid-spring-boot-starter
Maven Central
2. Configuration file
1. spring boot
configuration file: application.yml
spring: datasource: dynamic: primary: master strict: false datasource: master: url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8 username: root password: root driver-class-name: com.mysql.cj.jdbc.Driver slave_1: url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8 username: root password: root driver-class-name: com.mysql.cj.jdbc.Driver sharding: url: jdbc:shardingsphere:classpath:sharding-sphere-config.yaml driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver
2. shardingsphere-jdbc
Configuration file
mode: type: Standalone repository: type: JDBC dataSources: write: dataSourceClassName: com.alibaba.druid.pool.DruidDataSource driverClassName: com.mysql.jdbc.Driver url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8 username: root password: root read: dataSourceClassName: com.alibaba.druid.pool.DruidDataSource driverClassName: com.mysql.jdbc.Driver url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8 username: root password: root rules: - !SHARDING autoTables: sharding_test: actualDataSources: write keyGenerateStrategy: column: id keyGeneratorName: snowflake shardingStrategy: standard: shardingColumn: id shardingAlgorithmName: auto_mod shardingAlgorithms: auto_mod: type: MOD props: sharding-count: 3 keyGenerators: snowflake: type: SNOWFLAKE - !READWRITE_SPLITTING dataSources: write: writeDataSourceName: write readDataSourceNames: -read loadBalancerName: round_robin loadBalancers: round_robin: type: ROUND_ROBIN props: sql-show: true
4. Configuration details
1. Data source configuration
(1) Configure DataSource Driver
Method 1: Use spring boot
default database configuration
spring: datasource: #Configure DataSource Driver driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver #Specify YAML configuration file url: jdbc:shardingsphere:classpath:xxx.yaml
Method 2: Use dynamic-datasource
to configure the database
spring: datasource: dynamic: primary:master strict: false datasource: master: url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8 username: root password: root driver-class-name: com.mysql.jdbc.Driver slave_1: url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8 username: root password: root driver-class-name: com.mysql.jdbc.Driver sharding: url: jdbc:shardingsphere:classpath:sharding-sphere-config.yaml driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver
(2) Configure Yaml
Background information
ShardingSphere-JDBC supports all database JDBC drivers and connection pools.
The example database driver is MySQL and the connection pool is HikariCP. It can be replaced with other database drivers and connection pools. When using ShardingSphere-JDBC, the property name of the JDBC pool depends on the definition of the respective JDBC pool and is not defined by ShardingSphere. For related processing, please refer to the class org.apache.shardingsphere.infra.datasource.pool.creator.DataSourcePoolCreator
. For example, for Alibaba Druid 1.2.9, using url
instead of jdbcUrl
in the example below is expected behavior.
Parameter explanation
dataSources: # Data source configuration, multiple <data-source-name> can be configured <data_source_name>: # Data source name dataSourceClassName: # Complete class name of data source driverClassName: # Database driver class name, subject to the configuration of the database connection pool itself jdbcUrl: # Database URL connection, subject to the configuration of the database connection pool itself username: # Database username, subject to the configuration of the database connection pool itself password: # Database password, subject to the configuration of the database connection pool itself #...Other properties of the database connection pool
Configuration example
dataSources: write: dataSourceClassName: com.alibaba.druid.pool.DruidDataSource driverClassName: com.mysql.jdbc.Driver url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8 username: root password: root read: dataSourceClassName: com.alibaba.druid.pool.DruidDataSource driverClassName: com.mysql.jdbc.Driver url: jdbc:mysql://localhost:3306/dirk?serverTimezone=UTC & amp;useSSL=false & amp;useUnicode=true & amp;characterEncoding=UTF-8 username: root password: root
2. Rule configuration
(1) Basic concepts
Logic table
The logical name of a horizontally split database (table) with the same structure, which is the logical identifier of the table in SQL. Example: The order data is split into 10 tables based on the primary key’s mantissa, namely t_order_0
to t_order_9
, and their logical table names are t_order
.
True table
A physical table that actually exists in a horizontally split database. That is, t_order_0
to t_order_9
in the previous example.
Binding table
Refers to a group of shard tables with consistent sharding rules. When using binding tables to perform multi-table correlation queries, you must use shard keys for correlation, otherwise Cartesian product correlation or cross-database correlation will occur, thus affecting query efficiency. For example: the t_order
table and the t_order_item
table are both fragmented according to order_id
and are related using order_id
, then this The two tables are bound to each other. Multi-table correlation queries between bound tables will not have Cartesian product correlations, and the efficiency of correlation queries will be greatly improved. For example, if the SQL is:
SELECT i.* FROM t_order o JOIN t_order_item i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);
When the binding table relationship is not configured, assuming that the shard key order_id routes the value 10 to the 0th slice and the value 11 to the 1st slice, then the routed SQL should be 4 items, which are presented as Cartesian products:
SELECT i.* FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11); SELECT i.* FROM t_order_0 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11); SELECT i.* FROM t_order_1 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11); SELECT i.* FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);
After configuring the binding table relationship and using order_id
for association, the routing SQL should be 2:
SELECT i.* FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11); SELECT i.* FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);
Among them, the t_order
table has specified sharding conditions, and ShardingSphere will use it as the main table of the entire binding table. All routing calculations will only use the strategy of the main table, then the sharding calculations of the t_order_item
table will use the conditions of t_order
.
Note: Multiple sharding rules in the binding table need to be configured according to the logical table prefix combined with the sharding suffix, for example:
rules: - !SHARDING tables: t_order: actualDataNodes: ds_${<!-- -->0..1}.t_order_${<!-- -->0..1} t_order_item: actualDataNodes: ds_${<!-- -->0..1}.t_order_item_${<!-- -->0..1}
Broadcast table
Refers to tables that exist in all data sources. The table structure and its data are completely consistent in each database. It is suitable for scenarios where the amount of data is not large and requires associated queries with tables with massive data, such as dictionary tables.
Single table
Refers to the only table that exists in all sharded data sources. Suitable for tables with small data volume and no need for sharding.
Note: Single tables that meet the following conditions will be automatically loaded:
- Single table showing configuration in rules such as data encryption and data desensitization
- A single table created by users executing DDL statements through ShardingSphere
ShardingSphere will not automatically load the remaining single tables that do not meet the above conditions. Users can configure single table rules for management as needed.
Sharding key
Database field used to split the database (table) horizontally. Example: If the mantissa of the order primary key in the order table is modulo sharded, the order primary key will be the sharding field. If there is no fragmentation field in SQL, full routing will be performed and the performance will be poor. In addition to supporting single sharding fields, Apache ShardingSphere also supports sharding based on multiple fields.
Fragmentation algorithm
Algorithm for sharding data, supports =
, >=
, <=
, >
, <
, BETWEEN
and IN
for sharding. The sharding algorithm can be implemented by developers themselves, or they can use Apache ShardingSphere's built-in sharding algorithm syntax sugar, which is very flexible.
Automated sharding algorithm
Sharding algorithm syntactic sugar is used to conveniently host all data nodes, and users do not need to pay attention to the physical distribution of real tables. Including the implementation of common sharding algorithms such as modulo, hash, range, and time.
Customized sharding algorithm
Provides an interface for application developers to implement sharding algorithms closely related to business implementation, and allows users to manage the physical distribution of real tables by themselves. Custom sharding algorithms are divided into:
- Standard Sharding Algorithm
For handling =
, IN
, BETWEEN AND
, >
, that use a single key as the shard key ><
, >=
, <=
scenarios for sharding.
- Composite Sharding Algorithm
It is used to handle sharding scenarios where multiple keys are used as sharding keys. The logic of multiple sharding keys is complex, and application developers need to handle the complexity themselves.
- Hint sharding algorithm
Used to handle scenarios where Hint
line slicing is used.
Fragmentation strategy
Contains the sharding key and sharding algorithm. Due to the independence of the sharding algorithm, it is extracted independently. What can really be used for sharding operations is the sharding key + sharding algorithm, which is the sharding strategy.
Forced fragmentation routing
For scenarios where the sharding field is not determined by SQL but other external conditions, you can use SQL Hint to inject the sharding value. Example: Log into the database according to the primary key of the employee, but there is no such field in the database. SQL Hint supports use through Java API and SQL annotations. For details, see Forced Shard Routing.
Row expression
Row expressions are used to solve the two main problems of configuration simplification and integration. In the cumbersome configuration of data sharding rules, as the number of data nodes increases, a large number of repeated configurations make the configuration itself difficult to maintain. Row expressions can effectively simplify the data node configuration workload.
For common sharding algorithms, using Java code to implement them does not facilitate unified management of configurations. Writing sharding algorithms through row expressions can effectively store rule configurations together, making it easier to browse and store.
The line expression as a string consists of two parts, namely the Type Name part corresponding to the SPI implementation at the beginning of the string and the expression part. Taking
as an example, the characters
The substring GROOVY
of the string
part is the Type Name of the corresponding SPI implementation used by this line expression, which is symbolized by <>
package to identify. And the string t_order_${1..3}
The expression part of this row's expression. When the line expression does not specify a Type Name, such as t_order_${1..3}
, the line expression will use the InlineExpressionParser
SPI by default
GROOVY
is implemented to parse expressions.
The following sections describe the syntax rules implemented by GROOVY
.
The use of row expressions is very intuitive. You only need to use ${ expression }
or $->{ expression }
to identify the row expression in the configuration. Currently, the configuration of two parts, data node and sharding algorithm, is supported. The content of row expressions uses Groovy's syntax, and row expressions can support all operations that Groovy can support. For example:
${begin..end}
represents the range
${[unit1, unit2, unit_x]}
represents the enumeration value
If there are multiple ${ expression }
or $->{ expression }
expressions in a row, the final result of the entire expression will be based on each sub-expression. Cartesian combination of the results.
For example, the following line expression:
${['online', 'offline']}_table${1..3}
It will eventually be parsed into:
online_table1, online_table2, online_table3, offline_table1, offline_table2, offline_table3
Distributed primary key
In traditional database software development, automatic primary key generation technology is a basic requirement. Each database also provides corresponding support for this requirement, such as MySQL's auto-increment key, Oracle's auto-increment sequence, etc. After data is sharded, generating globally unique primary keys from different data nodes is a very difficult problem. The auto-increment keys between different actual tables in the same logical table cannot sense each other, resulting in duplicate primary keys. Although collisions can be avoided by constraining the initial value and step size of the auto-increasing primary key, additional operation and maintenance rules need to be introduced, making the solution lack completeness and scalability.
There are currently many third-party solutions that can perfectly solve this problem, such as UUID, etc. that rely on specific algorithms to self-generate unique keys, or by introducing primary key generation services. In order to facilitate users and meet the needs of different users in different usage scenarios, Apache ShardingSphere not only provides built-in distributed primary key generators, such as UUID and SNOWFLAKE, but also extracts the interface of the distributed primary key generator to facilitate users to implement their own customization The auto-incrementing primary key generator.
(2) Configure yaml
Background information
The data sharding YAML configuration method has extraordinary readability. Through the YAML format, you can quickly understand the dependencies between sharding rules. ShardingSphere will automatically complete the creation of ShardingSphereDataSource objects based on the YAML configuration, reducing unnecessary coding work for users. .
Parameter explanation
rules: - !SHARDING tables: # Data sharding rule configuration <logic_table_name> ( + ): # Logical table name actualDataNodes (?): # Composed of data source name + table name (refer to Inline syntax rules) databaseStrategy (?): # Splitting strategy, the default means using the default sharding strategy, only one of the following sharding strategies can be selected standard: # Standard sharding scenario for single sharding key shardingColumn: # Sharding column name shardingAlgorithmName: # Sharding algorithm name complex: # Used for composite sharding scenarios with multiple shard keys shardingColumns: # Sharding column names, multiple columns separated by commas shardingAlgorithmName: # Sharding algorithm name hint: # Hint sharding strategy shardingAlgorithmName: # Sharding algorithm name none: # No fragmentation tableStrategy: # Table splitting strategy, same as database splitting strategy keyGenerateStrategy: # Distributed sequence strategy column: #Auto-increment column name, the default means not to use the auto-increment primary key generator keyGeneratorName: # Distributed sequence algorithm name auditStrategy: # Sharding audit strategy auditorNames: # Sharding audit algorithm names - <auditor_name> - <auditor_name> allowHintDisable: true # Whether to disable shard audit hints autoTables: # Automatic sharding table rule configuration t_order_auto: # Logical table name actualDataSources (?): # Data source name shardingStrategy: # Sharding strategy standard: # Standard sharding scenario for single sharding key shardingColumn: # Sharding column name shardingAlgorithmName: # Automatic sharding algorithm name bindingTables ( + ): # List of binding table rules - <logic_table_name_1, logic_table_name_2, ...> - <logic_table_name_1, logic_table_name_2, ...> defaultDatabaseStrategy: #Default database sharding strategy defaultTableStrategy: #Default table sharding strategy defaultKeyGenerateStrategy: #Default distributed sequence strategy defaultShardingColumn: #Default sharding column name # Sharding algorithm configuration shardingAlgorithms: <sharding_algorithm_name> ( + ): # Sharding algorithm name type: # Sharding algorithm type props: # Sharding algorithm property configuration #... # Distributed sequence algorithm configuration keyGenerators: <key_generate_algorithm_name> ( + ): # Distributed sequence algorithm name type: # Distributed sequence algorithm type props: # Distributed sequence algorithm property configuration #... # Sharding audit algorithm configuration auditors: <sharding_audit_algorithm_name> ( + ): # Sharding audit algorithm name type: # Sharding audit algorithm type props: # Sharding audit algorithm property configuration #... - !BROADCAST tables: # Broadcast table rule list - <table_name> - <table_name>
Operation steps
- Configure data sharding rules in the YAML file, including data sources, sharding rules, global attributes and other configuration items;
- Call the createDataSource method of the YamlShardingSphereDataSourceFactory object to create a ShardingSphereDataSource based on the configuration information in the YAML file.
Configuration example
The data sharding YAML configuration example is as follows:
rules: - !SHARDING autoTables: sharding_test: actualDataSources: write keyGenerateStrategy: column: id keyGeneratorName: snowflake shardingStrategy: standard: shardingColumn: id shardingAlgorithmName: auto_mod shardingAlgorithms: auto_mod: type: MOD props: sharding-count: 3 keyGenerators: snowflake: type: SNOWFLAKE - !READWRITE_SPLITTING dataSources: write: writeDataSourceName: write readDataSourceNames: - read loadBalancerName: round_robin loadBalancers: round_robin: type: ROUND_ROBIN
Through the createDataSource method of YamlShardingSphereDataSourceFactory, read the YAML configuration to complete the creation of the data source.
YamlShardingSphereDataSourceFactory.createDataSource(getFile("/META-INF/sharding-databases-tables.yaml"));
3. Other configuration
(1) Mode configuration
Parameter explanation
mode (?): # If not configured, the default stand-alone mode type: # Run mode type. Optional configuration: Standalone, Cluster repository (?): # Persistent warehouse configuration
Configuration example
mode: type: Standalone # Standalone mode repository: type: JDBC
(2) props
Configuration
Background information
Apache ShardingSphere provides system-level configuration through property configuration.
Parameter explanation
Name | Data type | Description | Default value |
---|---|---|---|
sql-show (?) | boolean | Whether to print SQL in the log Printing SQL can help developers quickly locate system problems. Log content includes: logical SQL, real SQL and SQL parsing results. If the configuration is enabled, the log will use Topic ShardingSphere-SQL , and the log level is INFO |
false |
sql-simple (? ) | boolean | Whether to print simple style SQL in the log | false |
kernel-executor -size (?) | int | is used to set the size of the task processing thread pool Each ShardingSphereDataSource uses an independent thread pool, and different data sources of the same JVM do not share the thread pool |
infinite |
max-connections-size-per -query (?) | int | The maximum number of connections that can be used in each database instance for a query request | 1 |
check-table-metadata-enabled (?) | boolean | Whether to check the structural consistency of shard metadata when the program starts and updates | false |
Operation steps
Property configuration is directly configured in the configuration file used by ShardingSphere-JDBC, in the following format:
props: sql-show: true
Configuration example
props: # print sql sql-show: true