This article is the third article in the series “Principles and Practice of ShardingSphere5. In code> environment, sub-database and table can be quickly realized through JAVA encoding
and Yml configuration
.
1. What is ShardingSphere?
shardingsphere
is an open source distributed relational database middleware and is the top-level project of Apache
. Its predecessor was two independent projects of sharding-jdbc
and sharding-proxy
. They were later merged into one project in 2018 and officially renamed ShardingSphere.
Among them, sharding-jdbc is the most classic and mature framework in the entire ecosystem. The first people who came into contact with sharding should know it. It is the best introductory tool for learning sharding.
Today’s ShardingSphere no longer simply refers to a certain framework, but a complete technology ecosystem, consisting of three open source distributed database middlewares: sharding-jdbc, sharding-proxy and sharding-sidecar. The first two came out earlier and have more mature functions. They are two widely used distributed database middlewares. Therefore, in subsequent articles, we will focus on their characteristics and usage methods.
2. Why choose ShardingSphere?
In order to answer this question, I have sorted out the common sharding tools on the market, including ShardingSphere
, Cobar
, Mycat
, and TDDL
, MySQL Fabric
, etc., and briefly compared them from multiple perspectives.
Cobar
Cobar is Alibaba’s open source distributed database middleware based on MySQL, which provides functions such as database and table sharding, read-write separation, and transaction management. It uses polling algorithm and hash algorithm for data sharding, and supports distributed table sharding, but does not support single database sharding into multiple tables.
It provides services in the form of Proxy
. It is widely used within Alibaba and is open source. It is relatively easy to configure and does not need to rely on other things. It only requires a Java environment. It is compatible with almost all ORM frameworks on the market, but only supports MySQL database, and transaction support is more troublesome.
MyCAT
Mycat
is a secondary development by community enthusiasts based on Alibaba Cobar. It is also a relatively classic tool for sharding databases and tables. It provides services in Proxy mode and supports functions such as database sharding and table sharding, read-write separation, SQL routing, and data sharding.
It is compatible with almost all ORM frameworks on the market, including Hibernate, MyBatis and JPA. However, the fly in the ointment is that it only supports MySQL database, and the current community activity is relatively low.
TDDL
TDDL is an open source sharding solution provided by Alibaba Group, which can automatically route SQL to the corresponding database tables. It uses two methods of vertical sharding and horizontal sharding to divide tables and databases, and supports multiple data sources and read-write separation functions.
TDDL is developed based on Java, supports MySQL, Oracle and SQL Server databases, and can be integrated with ORM frameworks such as Hibernate and MyBatis on the market.
However, TDDL only supports the integration of some Alibaba internal tools and frameworks, which may be relatively limiting for external companies. At the same time, its documentation and community activity are slightly insufficient compared to ShardingSphere.
Mysql Fabric
MySQL Fabric
is a sub-database and table solution officially provided by MySQL. It also supports other MySQL functions, such as high availability, load balancing, etc. It adopts the architecture of management nodes and agent nodes. The management node is responsible for real-time management of sharding information, and the agent node is responsible for receiving and processing client read and write requests.
It only supports MySQL database and can be integrated with ORM frameworks such as Hibernate and MyBatis on the market. The documentation of MySQL Fabric is relatively simple, and because it is an official solution, its community activity is relatively low.
ShardingSphere
Sharding-jdbc among the ShardingSphere members provides functions such as sharding of databases and tables, separation of reading and writing, and distributed transactions in the form of JAR
packages. However, it only supports Java applications and has limitations in application expansion.
Therefore, ShardingSphere launched the independent middleware sharding-proxy, which implements transparent sharding and multi-data source functions based on the MySQL protocol, and supports the use of applications in various languages and frameworks. The docked applications require almost no code changes, and the partitioning Database sharding configuration can be managed in the proxy service.
In addition to supporting MySQL, ShardingSphere can also support multiple mainstream databases such as PostgreSQL, SQLServer, and Oracle, and can be well integrated with ORM frameworks such as Hibernate, MyBatis, and JPA. Importantly, ShardingSphere’s open source community is very active.
If a problem occurs during use, users can submit a PR on GitHub and receive quick response and resolution, which provides users with a sufficient sense of security.
Product comparison
By comparing the above five sharding tools, we can easily find that ShardingSphere has outstanding advantages among many products in terms of overall performance, feature richness, and community support. Below is a table compiled with the main indicators of each product to make it more intuitive.
3. ShardingSphere members
The main components of ShardingSphere are sharding-jdbc
and sharding-proxy
, which are two different modes for implementing sharding and sharding:
sharding-jdbc
It is a lightweight Java framework that provides sharding of databases and tables based on JDBC, and is a client direct connection mode. Using sharding-jdbc, developers can achieve data sharding through simple configuration without modifying the original SQL statements. It supports a variety of sharding strategies and algorithms, and can be seamlessly integrated with various mainstream ORM frameworks.
?
sharding-proxy
It is a proxy service based on the MySQL protocol and provides transparent database and table sharding functions. Using sharding-proxy, developers can decouple sharding logic from applications and implement sharding functions without modifying application code. It also supports advanced features such as multiple data sources and read-write separation, and can run as an independent service. .
?
4. Quick implementation
We first use sharding-jdbc
to quickly implement database and table sharding. Compared with sharding-proxy, sharding-jdbc is suitable for simple application scenarios and does not require additional environment setup. The following two methods are mainly based on SpringBoot to implement sub-database and sub-table. One is through the YML configuration
method, and the other is through the pure Java encoding
method (cannot coexist). In subsequent chapters, we will introduce in detail how to use sharding-proxy
and other advanced features.
ShardingSphere official website address: https://shardingsphere.apache.org/
Preparation
Before starting the implementation, the splitting rules of the database and tables need to be clarified. Take the t_order
table split into databases and tables as an example. Specifically, we split the t_order table into two databases, namely db1
and db2
, each database splits the table into three tables, namely t_order_1
, t_order_2
and t_order_3
.
JAR package introduction
Introduce the necessary JAR packages, the most important of which are shardingsphere-jdbc-core-spring-boot-starter
and mysql-connector-java
. In order to ensure the comprehensiveness and compatibility of functions, and to avoid unnecessary errors and debugging work caused by lower version packages, I choose higher versions of packages.
shardingsphere-jdbc-core-spring-boot-starter is the core component of the ShardingSphere framework, which provides sub-database and sub-table support for JDBC; while mysql-connector-java is the implementation of the MySQL JDBC driver and is used to connect to the MySQL database. In addition, I used JPA
as a persistence tool and introduced corresponding dependency packages.
<!-- jpa persistence tool --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> <version>2.7.6</version> </dependency> <!-- The package mysql that must be imported --> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.31</version> </dependency> <!-- Package ShardingSphere that must be imported --> <dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>shardingsphere-jdbc-core-spring-boot-starter</artifactId> <version>5.2.0</version> </dependency>
YML configuration
I personally recommend using YML configuration method to implement sharding-jdbc sub-database and table. Using YML configuration method not only makes the implementation of sub-database and sub-table more simple, efficient and maintainable, but also more in line with SpringBoot development specifications.
Add the following complete configuration under the src/main/resources/application.yml path file to realize the sub-database and sub-table of the t_order
table. Next, disassemble and see what each configuration module does. Something happened.
spring: shardingsphere: #Data source configuration datasource: # Data source name, multiple data sources separated by commas names: db0,db1 db0: type: com.zaxxer.hikari.HikariDataSource driver-class-name: com.mysql.cj.jdbc.Driver jdbc-url: jdbc:mysql://127.0.0.1:3306/shardingsphere-db1?useUnicode=true & amp;characterEncoding=utf-8 & amp;useSSL=false & amp;serverTimezone=Asia/Shanghai & amp;allowPublicKeyRetrieval= true username: root password: 123456 db1: type: com.zaxxer.hikari.HikariDataSource driver-class-name: com.mysql.cj.jdbc.Driver jdbc-url: jdbc:mysql://127.0.0.1:3306/shardingsphere-db0?useUnicode=true & amp;characterEncoding=utf-8 & amp;useSSL=false & amp;serverTimezone=Asia/Shanghai & amp;allowPublicKeyRetrieval= true username: root password: 123456 # Sharding rule configuration rules: sharding: # Sharding algorithm configuration sharding-algorithms: database-inline: # Sharding algorithm type type: INLINE props: # Row expression of the sharding algorithm (the algorithm is self-defined, here is for convenience of demonstration effect) algorithm-expression: db$->{order_id > 4?1:0} table-inline: # Sharding algorithm type type: INLINE props: # Row expression for sharding algorithm algorithm-expression: t_order_$->{order_id % 4} tables: # Logical table name t_order: # Line expression identifiers can use ${...} or $->{...}, but the former conflicts with Spring's own property file placeholders, so it is recommended to use line expression identifiers in the Spring environment $->{...} actual-data-nodes: db${0..1}.t_order_${0..3} #Splitting strategy database-strategy: standard: # Sharding column name sharding-column: order_id # Sharding algorithm name sharding-algorithm-name: database-inline # Table splitting strategy table-strategy: standard: # Sharding column name sharding-column: order_id # Sharding algorithm name sharding-algorithm-name: table-inline #Property configuration props: # Display the modified sql statement sql-show: true
The following is the configuration of shardingsphere
multi-data source information. names
represents the list of database aliases that need to be connected. Each time a database name is added, a corresponding database connection needs to be added. Configuration
spring: shardingsphere: #Data source configuration datasource: # Data source name, multiple data sources separated by commas names: db0,db1 db0: type: com.zaxxer.hikari.HikariDataSource driver-class-name: com.mysql.cj.jdbc.Driver jdbc-url: jdbc:mysql://127.0.0.1:3306/shardingsphere-db1?useUnicode=true & amp;characterEncoding=utf-8 & amp;useSSL=false & amp;serverTimezone=Asia/Shanghai & amp;allowPublicKeyRetrieval= true username: root password: 123456 db1: type: com.zaxxer.hikari.HikariDataSource driver-class-name: com.mysql.cj.jdbc.Driver jdbc-url: jdbc:mysql://127.0.0.1:3306/shardingsphere-db0?useUnicode=true & amp;characterEncoding=utf-8 & amp;useSSL=false & amp;serverTimezone=Asia/Shanghai & amp;allowPublicKeyRetrieval= true username: root password: 123456
The rules
node is the configuration of sharding rules. The sharding-algorithms
node is a custom sharding algorithm module. The sharding algorithm can be configured later when configuring the sharding rules of the table. Quoted among them:
-
database-inline
: Customized sharding algorithm name; -
type
: The type of the sharding algorithm. Here we take inline as an example. Detailed chapters will be introduced later; -
props
: Specify the specific content of the sharding algorithm, wherealgorithm-expression
is the expression of the sharding algorithm, that is, the real value to be accessed is calculated based on the sharding key value. Database name or table name,.
db$->{order_id % 2}
This is a Groovy language expression, which means to take the modulo of the shard keyorder_id
, and calculate db0 and db1 based on the modulo result. , the expression of the sub-table is the same.
spring: shardingsphere: # Rule configuration rules: sharding: # Sharding algorithm configuration sharding-algorithms: database-inline: # Sharding algorithm type type: INLINE props: # Row expression of the sharding algorithm (the algorithm is self-defined, here is for convenience of demonstration effect) algorithm-expression: db$->{order_id % 2} table-inline: # Sharding algorithm type type: INLINE props: # Row expression for sharding algorithm algorithm-expression: t_order_$->{order_id % 3}
The tables
node defines the sharding rules for the logical table name t_order
. actual-data-nodes
is used to set the number of physical data nodes.
db${0..1}.t_order_${0..3}
The expression means the distribution of this logical table in different database instances. If you only want to simply divide the database or table, Expressions can be adjusted, sub-database db${0..1}
, sub-table t_order_${0..3}
.
db0
├── t_order_0
├── t_order_1
└── t_order_2
db1
├── t_order_0
├── t_order_1
└── t_order_2
spring: shardingsphere: # Rule configuration rules: sharding: tables: # Logical table name t_order: # Line expression identifiers can use ${...} or $->{...}, but the former conflicts with Spring's own property file placeholders, so it is recommended to use line expression identifiers in the Spring environment $->{...} actual-data-nodes: db${0..1}.t_order_${0..3} #Splitting strategy database-strategy: standard: # Sharding column name sharding-column: order_id # Sharding algorithm name sharding-algorithm-name: database-inline # Table splitting strategy table-strategy: standard: # Sharding column name sharding-column: order_id # Sharding algorithm name sharding-algorithm-name: table-inline
database-strategy
and table-strategy
set the database and table sharding strategies respectively;
sharding-column
indicates which column (sharding key) of the table is used to calculate and route sharding to which library and table;
sharding-algorithm-name
indicates which sharding algorithm is used to operate the sharding key. Here you can refer to the sharding algorithm name you just customized.
The props
node is used to set other property configurations, such as: sql-show
indicates whether to output the actual executed SQL statement after parsing and transformation on the console for debugging.
spring: shardingsphere: #Property configuration props: # Display the modified sql statement sql-show: true
When I ran a single test and inserted 16 pieces of data into the database, I found that the data had been inserted into each shard relatively evenly.
JAVA encoding
If you don’t want to implement automatic assembly through yml configuration files, you can also use ShardingSphere’s API to achieve the same functionality. The advantage of using API to complete the configuration of sharding rules and data sources is that it is more flexible and customizable, making it convenient for secondary development and expansion.
The following is the complete code for implementing sub-database and sub-table using pure JAVA coding.
@Configuration public class ShardingConfiguration { /** * Configure sharded data sources * Public account: Programmer Xiaofu */ @Bean public DataSource getShardingDataSource() throws SQLException { Map<String, DataSource> dataSourceMap = new HashMap<>(); dataSourceMap.put("db0", dataSource1()); dataSourceMap.put("db1", dataSource2()); // Sharding rules rule configuration ShardingRuleConfiguration shardingRuleConfig = new ShardingRuleConfiguration(); shardingRuleConfig.setShardingAlgorithms(getShardingAlgorithms()); //Configure t_order table sharding rules ShardingTableRuleConfiguration orderTableRuleConfig = new ShardingTableRuleConfiguration("t_order", "db${0..1}.t_order_${0..2}"); orderTableRuleConfig.setTableShardingStrategy(new StandardShardingStrategyConfiguration("order_id", "table-inline")); orderTableRuleConfig.setDatabaseShardingStrategy(new StandardShardingStrategyConfiguration("order_id", "database-inline")); shardingRuleConfig.getTables().add(orderTableRuleConfig); // Whether to output the actual SQL executed after parsing and transformation on the console Properties properties = new Properties(); properties.setProperty("sql-show", "true"); //Create ShardingSphere data source return ShardingSphereDataSourceFactory.createDataSource(dataSourceMap, Collections.singleton(shardingRuleConfig), properties); } /** * Configure data source 1 * Public account: Programmer Xiaofu */ public DataSource dataSource1() { HikariDataSource dataSource = new HikariDataSource(); dataSource.setDriverClassName("com.mysql.cj.jdbc.Driver"); dataSource.setJdbcUrl("jdbc:mysql://127.0.0.1:3306/shardingsphere-db1?useUnicode=true & amp;characterEncoding=utf-8 & amp;useSSL=false & amp;serverTimezone=Asia/Shanghai & amp;allowPublicKeyRetrieval =true"); dataSource.setUsername("root"); dataSource.setPassword("123456"); return dataSource; } /** * Configure data source 2 * Public account: Programmer Xiaofu */ public DataSource dataSource2() { HikariDataSource dataSource = new HikariDataSource(); dataSource.setDriverClassName("com.mysql.cj.jdbc.Driver"); dataSource.setJdbcUrl("jdbc:mysql://127.0.0.1:3306/shardingsphere-db0?useUnicode=true & amp;characterEncoding=utf-8 & amp;useSSL=false & amp;serverTimezone=Asia/Shanghai & amp;allowPublicKeyRetrieval =true"); dataSource.setUsername("root"); dataSource.setPassword("123456"); return dataSource; } /** * Configure sharding algorithm * Public account: Programmer Xiaofu */ private Map<String, AlgorithmConfiguration> getShardingAlgorithms() { Map<String, AlgorithmConfiguration> shardingAlgorithms = new LinkedHashMap<>(); // Custom library sub-algorithm Properties databaseAlgorithms = new Properties(); databaseAlgorithms.setProperty("algorithm-expression", "db$->{order_id % 2}"); shardingAlgorithms.put("database-inline", new AlgorithmConfiguration("INLINE", databaseAlgorithms)); // Customized table sub-algorithm Properties tableAlgorithms = new Properties(); tableAlgorithms.setProperty("algorithm-expression", "t_order_$->{order_id % 3}"); shardingAlgorithms.put("table-inline", new AlgorithmConfiguration("INLINE", tableAlgorithms)); return shardingAlgorithms; } }
The sharding core configuration class ShardingRuleConfiguration
of ShardingSphere
is mainly used to load core configurations such as sharding rules, sharding algorithms, primary key generation rules, binding tables, and broadcast tables. We set the relevant configuration information to the configuration class, create and overwrite DataSource
through createDataSource
, and finally inject the Bean.
Using Java coding method is just to manually implement ShardingSphere’s predictable loading configuration logic. After comparing the two implementation methods, it is recommended to use YML configuration method to realize the sub-database and sub-table function of ShardingSphere
. Compared with Compared with Java coding, YML configuration is more intuitive and easy to understand. Developers can focus more on the implementation of business logic without paying too much attention to the underlying technical details.
@Getter @Setter public final class ShardingRuleConfiguration implements DatabaseRuleConfiguration, DistributedRuleConfiguration { //Sub-table configuration configuration private Collection<ShardingTableRuleConfiguration> tables = new LinkedList<>(); // Automatic sharding rule configuration private Collection<ShardingAutoTableRuleConfiguration> autoTables = new LinkedList<>(); // Binding table configuration private Collection<String> bindingTableGroups = new LinkedList<>(); // Broadcast table configuration private Collection<String> broadcastTables = new LinkedList<>(); //Default sharding strategy configuration private ShardingStrategyConfiguration defaultDatabaseShardingStrategy; //Default table sharding strategy configuration private ShardingStrategyConfiguration defaultTableShardingStrategy; // Primary key generation strategy configuration private KeyGenerateStrategyConfiguration defaultKeyGenerateStrategy; private ShardingAuditStrategyConfiguration defaultAuditStrategy; //Default shard key private String defaultShardingColumn; //Customized sharding algorithm private Map<String, AlgorithmConfiguration> shardingAlgorithms = new LinkedHashMap<>(); // Primary key generation algorithm private Map<String, AlgorithmConfiguration> keyGenerators = new LinkedHashMap<>(); private Map<String, AlgorithmConfiguration> auditors = new LinkedHashMap<>(); }
After checking the real SQL log printed by the console, we found that when using ShardingSphere
for data insertion, its internal implementation will first query whether the record exists based on the sharding key order_id
. If the record does not exist, an insert operation is performed; if the record already exists, an update operation is performed. It seems that only 10 insert SQL statements will be executed, but in fact 20 SQL statements need to be executed, which will have a certain impact on the performance of the database.
The function is quite simple, but due to the API changes in different versions of ShardingSphere
Default data source
Some friends may have questions. The t_order
table that has set up sharding rules can operate data normally. If our t_user
table does not have sharding rules configured, So what happens when performing an insert operation?
After carefully reading the official technical documentation, I have actually answered my friend’s question. If only some databases are divided into databases and tables, do I need to configure the tables that are not divided into databases and tables in the sharding rules? Official answer: No.
We create a t_user
table and do not configure any sharding rules on it. In my impression, if default-data-source-name
is not set as the default data source, an error should be reported when operating an unsharded table!
We tried to insert a piece of data into t_user
, and it succeeded? After looking through the database table, I found that the data was only inserted into the db1 database, indicating that the broadcast route was not used.
shardingsphere-jdbc Version 5.x
removed the original default data source configuration and automatically used the rules of the default data source. To verify that I added more data sources, I tentatively adjusted Insert data again in the order of db2
, db0
, and db1
. This time the record is inserted into the db2
library. After repeated trials, the data is inserted again. get conclusion.
Unsharded tables will use the first data source as the default data source by default, which is the first datasource.names
one.
spring: shardingsphere: #Data source configuration datasource: # Data source name, multiple data sources separated by commas names: db2, db1, db0
Shardingsphere gave a brief introduction and quickly implemented the sub-database and sub-table functions using yml and Java coding. Next, we will implement them one by one according to the functions of the mind map at the beginning of the article.