SpringBoot 2 ways to quickly implement sub-database and sub-table

This article is the third article in the series “Principles and Practice of ShardingSphere5. In code> environment, sub-database and table can be quickly realized through JAVA encoding and Yml configuration.

1. What is ShardingSphere?

shardingsphere is an open source distributed relational database middleware and is the top-level project of Apache. Its predecessor was two independent projects of sharding-jdbc and sharding-proxy. They were later merged into one project in 2018 and officially renamed ShardingSphere.

Among them, sharding-jdbc is the most classic and mature framework in the entire ecosystem. The first people who came into contact with sharding should know it. It is the best introductory tool for learning sharding.

Today’s ShardingSphere no longer simply refers to a certain framework, but a complete technology ecosystem, consisting of three open source distributed database middlewares: sharding-jdbc, sharding-proxy and sharding-sidecar. The first two came out earlier and have more mature functions. They are two widely used distributed database middlewares. Therefore, in subsequent articles, we will focus on their characteristics and usage methods.

2. Why choose ShardingSphere?

In order to answer this question, I have sorted out the common sharding tools on the market, including ShardingSphere, Cobar, Mycat, and TDDL , MySQL Fabric, etc., and briefly compared them from multiple perspectives.

Cobar

Cobar is Alibaba’s open source distributed database middleware based on MySQL, which provides functions such as database and table sharding, read-write separation, and transaction management. It uses polling algorithm and hash algorithm for data sharding, and supports distributed table sharding, but does not support single database sharding into multiple tables.

It provides services in the form of Proxy. It is widely used within Alibaba and is open source. It is relatively easy to configure and does not need to rely on other things. It only requires a Java environment. It is compatible with almost all ORM frameworks on the market, but only supports MySQL database, and transaction support is more troublesome.

MyCAT

Mycat is a secondary development by community enthusiasts based on Alibaba Cobar. It is also a relatively classic tool for sharding databases and tables. It provides services in Proxy mode and supports functions such as database sharding and table sharding, read-write separation, SQL routing, and data sharding.

It is compatible with almost all ORM frameworks on the market, including Hibernate, MyBatis and JPA. However, the fly in the ointment is that it only supports MySQL database, and the current community activity is relatively low.

TDDL

TDDL is an open source sharding solution provided by Alibaba Group, which can automatically route SQL to the corresponding database tables. It uses two methods of vertical sharding and horizontal sharding to divide tables and databases, and supports multiple data sources and read-write separation functions.

TDDL is developed based on Java, supports MySQL, Oracle and SQL Server databases, and can be integrated with ORM frameworks such as Hibernate and MyBatis on the market.

However, TDDL only supports the integration of some Alibaba internal tools and frameworks, which may be relatively limiting for external companies. At the same time, its documentation and community activity are slightly insufficient compared to ShardingSphere.

Mysql Fabric

MySQL Fabric is a sub-database and table solution officially provided by MySQL. It also supports other MySQL functions, such as high availability, load balancing, etc. It adopts the architecture of management nodes and agent nodes. The management node is responsible for real-time management of sharding information, and the agent node is responsible for receiving and processing client read and write requests.

It only supports MySQL database and can be integrated with ORM frameworks such as Hibernate and MyBatis on the market. The documentation of MySQL Fabric is relatively simple, and because it is an official solution, its community activity is relatively low.

ShardingSphere

Sharding-jdbc among the ShardingSphere members provides functions such as sharding of databases and tables, separation of reading and writing, and distributed transactions in the form of JAR packages. However, it only supports Java applications and has limitations in application expansion.

Therefore, ShardingSphere launched the independent middleware sharding-proxy, which implements transparent sharding and multi-data source functions based on the MySQL protocol, and supports the use of applications in various languages and frameworks. The docked applications require almost no code changes, and the partitioning Database sharding configuration can be managed in the proxy service.

In addition to supporting MySQL, ShardingSphere can also support multiple mainstream databases such as PostgreSQL, SQLServer, and Oracle, and can be well integrated with ORM frameworks such as Hibernate, MyBatis, and JPA. Importantly, ShardingSphere’s open source community is very active.

If a problem occurs during use, users can submit a PR on GitHub and receive quick response and resolution, which provides users with a sufficient sense of security.

Product comparison

By comparing the above five sharding tools, we can easily find that ShardingSphere has outstanding advantages among many products in terms of overall performance, feature richness, and community support. Below is a table compiled with the main indicators of each product to make it more intuitive.

3. ShardingSphere members

The main components of ShardingSphere are sharding-jdbc and sharding-proxy, which are two different modes for implementing sharding and sharding:

sharding-jdbc

It is a lightweight Java framework that provides sharding of databases and tables based on JDBC, and is a client direct connection mode. Using sharding-jdbc, developers can achieve data sharding through simple configuration without modifying the original SQL statements. It supports a variety of sharding strategies and algorithms, and can be seamlessly integrated with various mainstream ORM frameworks.

sharding-proxy

It is a proxy service based on the MySQL protocol and provides transparent database and table sharding functions. Using sharding-proxy, developers can decouple sharding logic from applications and implement sharding functions without modifying application code. It also supports advanced features such as multiple data sources and read-write separation, and can run as an independent service. .

4. Quick implementation

We first use sharding-jdbc to quickly implement database and table sharding. Compared with sharding-proxy, sharding-jdbc is suitable for simple application scenarios and does not require additional environment setup. The following two methods are mainly based on SpringBoot to implement sub-database and sub-table. One is through the YML configuration method, and the other is through the pure Java encoding method (cannot coexist). In subsequent chapters, we will introduce in detail how to use sharding-proxy and other advanced features.

ShardingSphere official website address: https://shardingsphere.apache.org/

Preparation

Before starting the implementation, the splitting rules of the database and tables need to be clarified. Take the t_order table split into databases and tables as an example. Specifically, we split the t_order table into two databases, namely db1 and db2, each database splits the table into three tables, namely t_order_1, t_order_2 and t_order_3.

JAR package introduction

Introduce the necessary JAR packages, the most important of which are shardingsphere-jdbc-core-spring-boot-starter and mysql-connector-java. In order to ensure the comprehensiveness and compatibility of functions, and to avoid unnecessary errors and debugging work caused by lower version packages, I choose higher versions of packages.

shardingsphere-jdbc-core-spring-boot-starter is the core component of the ShardingSphere framework, which provides sub-database and sub-table support for JDBC; while mysql-connector-java is the implementation of the MySQL JDBC driver and is used to connect to the MySQL database. In addition, I used JPA as a persistence tool and introduced corresponding dependency packages.

<!-- jpa persistence tool -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-jpa</artifactId>
    <version>2.7.6</version>
</dependency>
<!-- The package mysql that must be imported -->
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>8.0.31</version>
</dependency>
<!-- Package ShardingSphere that must be imported -->
<dependency>
    <groupId>org.apache.shardingsphere</groupId>
    <artifactId>shardingsphere-jdbc-core-spring-boot-starter</artifactId>
    <version>5.2.0</version>
</dependency>

YML configuration

I personally recommend using YML configuration method to implement sharding-jdbc sub-database and table. Using YML configuration method not only makes the implementation of sub-database and sub-table more simple, efficient and maintainable, but also more in line with SpringBoot development specifications.

Add the following complete configuration under the src/main/resources/application.yml path file to realize the sub-database and sub-table of the t_order table. Next, disassemble and see what each configuration module does. Something happened.

spring:
  shardingsphere:
    #Data source configuration
    datasource:
      # Data source name, multiple data sources separated by commas
      names: db0,db1
      db0:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://127.0.0.1:3306/shardingsphere-db1?useUnicode=true & amp;characterEncoding=utf-8 & amp;useSSL=false & amp;serverTimezone=Asia/Shanghai & amp;allowPublicKeyRetrieval= true
        username: root
        password: 123456
      db1:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://127.0.0.1:3306/shardingsphere-db0?useUnicode=true & amp;characterEncoding=utf-8 & amp;useSSL=false & amp;serverTimezone=Asia/Shanghai & amp;allowPublicKeyRetrieval= true
        username: root
        password: 123456
    # Sharding rule configuration
    rules:
      sharding:
        # Sharding algorithm configuration
        sharding-algorithms:
          database-inline:
            # Sharding algorithm type
            type: INLINE
            props:
              # Row expression of the sharding algorithm (the algorithm is self-defined, here is for convenience of demonstration effect)
              algorithm-expression: db$->{order_id > 4?1:0}
          table-inline:
            # Sharding algorithm type
            type: INLINE
            props:
              # Row expression for sharding algorithm
              algorithm-expression: t_order_$->{order_id % 4}
        tables:
          # Logical table name
          t_order:
            # Line expression identifiers can use ${...} or $->{...}, but the former conflicts with Spring's own property file placeholders, so it is recommended to use line expression identifiers in the Spring environment $->{...}
            actual-data-nodes: db${0..1}.t_order_${0..3}
            #Splitting strategy
            database-strategy:
              standard:
                # Sharding column name
                sharding-column: order_id
                # Sharding algorithm name
                sharding-algorithm-name: database-inline
            # Table splitting strategy
            table-strategy:
              standard:
                # Sharding column name
                sharding-column: order_id
                # Sharding algorithm name
                sharding-algorithm-name: table-inline
    #Property configuration
    props:
      # Display the modified sql statement
      sql-show: true

The following is the configuration of shardingsphere multi-data source information. names represents the list of database aliases that need to be connected. Each time a database name is added, a corresponding database connection needs to be added. Configuration

spring:
  shardingsphere:
    #Data source configuration
    datasource:
      # Data source name, multiple data sources separated by commas
      names: db0,db1
      db0:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://127.0.0.1:3306/shardingsphere-db1?useUnicode=true & amp;characterEncoding=utf-8 & amp;useSSL=false & amp;serverTimezone=Asia/Shanghai & amp;allowPublicKeyRetrieval= true
        username: root
        password: 123456
      db1:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://127.0.0.1:3306/shardingsphere-db0?useUnicode=true & amp;characterEncoding=utf-8 & amp;useSSL=false & amp;serverTimezone=Asia/Shanghai & amp;allowPublicKeyRetrieval= true
        username: root
        password: 123456

The rules node is the configuration of sharding rules. The sharding-algorithms node is a custom sharding algorithm module. The sharding algorithm can be configured later when configuring the sharding rules of the table. Quoted among them:

database-inline: Customized sharding algorithm name;
type: The type of the sharding algorithm. Here we take inline as an example. Detailed chapters will be introduced later;
props: Specify the specific content of the sharding algorithm, where algorithm-expression is the expression of the sharding algorithm, that is, the real value to be accessed is calculated based on the sharding key value. Database name or table name,.

db$->{order_id % 2} This is a Groovy language expression, which means to take the modulo of the shard key order_id, and calculate db0 and db1 based on the modulo result. , the expression of the sub-table is the same.

spring:
  shardingsphere:
    # Rule configuration
    rules:
      sharding:
        # Sharding algorithm configuration
        sharding-algorithms:
          database-inline:
            # Sharding algorithm type
            type: INLINE
            props:
              # Row expression of the sharding algorithm (the algorithm is self-defined, here is for convenience of demonstration effect)
              algorithm-expression: db$->{order_id % 2}
          table-inline:
            # Sharding algorithm type
            type: INLINE
            props:
              # Row expression for sharding algorithm
              algorithm-expression: t_order_$->{order_id % 3}

The tables node defines the sharding rules for the logical table name t_order. actual-data-nodes is used to set the number of physical data nodes.

db${0..1}.t_order_${0..3} The expression means the distribution of this logical table in different database instances. If you only want to simply divide the database or table, Expressions can be adjusted, sub-database db${0..1}, sub-table t_order_${0..3}.

db0
├── t_order_0
├── t_order_1
└── t_order_2
db1
├── t_order_0
├── t_order_1
└── t_order_2

spring:
  shardingsphere:
    # Rule configuration
    rules:
      sharding:
        tables:
          # Logical table name
          t_order:
            # Line expression identifiers can use ${...} or $->{...}, but the former conflicts with Spring's own property file placeholders, so it is recommended to use line expression identifiers in the Spring environment $->{...}
            actual-data-nodes: db${0..1}.t_order_${0..3}
            #Splitting strategy
            database-strategy:
              standard:
                # Sharding column name
                sharding-column: order_id
                # Sharding algorithm name
                sharding-algorithm-name: database-inline
            # Table splitting strategy
            table-strategy:
              standard:
                # Sharding column name
                sharding-column: order_id
                # Sharding algorithm name
                sharding-algorithm-name: table-inline

database-strategy and table-strategy set the database and table sharding strategies respectively;

sharding-column indicates which column (sharding key) of the table is used to calculate and route sharding to which library and table;

sharding-algorithm-name indicates which sharding algorithm is used to operate the sharding key. Here you can refer to the sharding algorithm name you just customized.

The props node is used to set other property configurations, such as: sql-show indicates whether to output the actual executed SQL statement after parsing and transformation on the console for debugging.

spring:
  shardingsphere:
    #Property configuration
    props:
      # Display the modified sql statement
      sql-show: true

When I ran a single test and inserted 16 pieces of data into the database, I found that the data had been inserted into each shard relatively evenly.

JAVA encoding

If you don’t want to implement automatic assembly through yml configuration files, you can also use ShardingSphere’s API to achieve the same functionality. The advantage of using API to complete the configuration of sharding rules and data sources is that it is more flexible and customizable, making it convenient for secondary development and expansion.

The following is the complete code for implementing sub-database and sub-table using pure JAVA coding.

@Configuration
public class ShardingConfiguration {

    /**
     * Configure sharded data sources
     * Public account: Programmer Xiaofu
     */
    @Bean
    public DataSource getShardingDataSource() throws SQLException {
        Map<String, DataSource> dataSourceMap = new HashMap<>();
        dataSourceMap.put("db0", dataSource1());
        dataSourceMap.put("db1", dataSource2());

        // Sharding rules rule configuration
        ShardingRuleConfiguration shardingRuleConfig = new ShardingRuleConfiguration();
        shardingRuleConfig.setShardingAlgorithms(getShardingAlgorithms());

        //Configure t_order table sharding rules
        ShardingTableRuleConfiguration orderTableRuleConfig = new ShardingTableRuleConfiguration("t_order", "db${0..1}.t_order_${0..2}");
        orderTableRuleConfig.setTableShardingStrategy(new StandardShardingStrategyConfiguration("order_id", "table-inline"));
        orderTableRuleConfig.setDatabaseShardingStrategy(new StandardShardingStrategyConfiguration("order_id", "database-inline"));
        shardingRuleConfig.getTables().add(orderTableRuleConfig);

        // Whether to output the actual SQL executed after parsing and transformation on the console
        Properties properties = new Properties();
        properties.setProperty("sql-show", "true");
        //Create ShardingSphere data source
        return ShardingSphereDataSourceFactory.createDataSource(dataSourceMap, Collections.singleton(shardingRuleConfig), properties);
    }

    /**
     * Configure data source 1
     * Public account: Programmer Xiaofu
     */
    public DataSource dataSource1() {
        HikariDataSource dataSource = new HikariDataSource();
        dataSource.setDriverClassName("com.mysql.cj.jdbc.Driver");
        dataSource.setJdbcUrl("jdbc:mysql://127.0.0.1:3306/shardingsphere-db1?useUnicode=true & amp;characterEncoding=utf-8 & amp;useSSL=false & amp;serverTimezone=Asia/Shanghai & amp;allowPublicKeyRetrieval =true");
        dataSource.setUsername("root");
        dataSource.setPassword("123456");
        return dataSource;
    }

    /**
     * Configure data source 2
     * Public account: Programmer Xiaofu
     */
    public DataSource dataSource2() {
        HikariDataSource dataSource = new HikariDataSource();
        dataSource.setDriverClassName("com.mysql.cj.jdbc.Driver");
        dataSource.setJdbcUrl("jdbc:mysql://127.0.0.1:3306/shardingsphere-db0?useUnicode=true & amp;characterEncoding=utf-8 & amp;useSSL=false & amp;serverTimezone=Asia/Shanghai & amp;allowPublicKeyRetrieval =true");
        dataSource.setUsername("root");
        dataSource.setPassword("123456");
        return dataSource;
    }

    /**
     * Configure sharding algorithm
     * Public account: Programmer Xiaofu
     */
    private Map<String, AlgorithmConfiguration> getShardingAlgorithms() {
        Map<String, AlgorithmConfiguration> shardingAlgorithms = new LinkedHashMap<>();

        // Custom library sub-algorithm
        Properties databaseAlgorithms = new Properties();
        databaseAlgorithms.setProperty("algorithm-expression", "db$->{order_id % 2}");
        shardingAlgorithms.put("database-inline", new AlgorithmConfiguration("INLINE", databaseAlgorithms));

        // Customized table sub-algorithm
        Properties tableAlgorithms = new Properties();
        tableAlgorithms.setProperty("algorithm-expression", "t_order_$->{order_id % 3}");
        shardingAlgorithms.put("table-inline", new AlgorithmConfiguration("INLINE", tableAlgorithms));

        return shardingAlgorithms;
    }
}

The sharding core configuration class ShardingRuleConfiguration of ShardingSphere is mainly used to load core configurations such as sharding rules, sharding algorithms, primary key generation rules, binding tables, and broadcast tables. We set the relevant configuration information to the configuration class, create and overwrite DataSource through createDataSource, and finally inject the Bean.

Using Java coding method is just to manually implement ShardingSphere’s predictable loading configuration logic. After comparing the two implementation methods, it is recommended to use YML configuration method to realize the sub-database and sub-table function of ShardingSphere. Compared with Compared with Java coding, YML configuration is more intuitive and easy to understand. Developers can focus more on the implementation of business logic without paying too much attention to the underlying technical details.

@Getter
@Setter
public final class ShardingRuleConfiguration implements DatabaseRuleConfiguration, DistributedRuleConfiguration {
    
    //Sub-table configuration configuration
    private Collection<ShardingTableRuleConfiguration> tables = new LinkedList<>();
    // Automatic sharding rule configuration
    private Collection<ShardingAutoTableRuleConfiguration> autoTables = new LinkedList<>();
    // Binding table configuration
    private Collection<String> bindingTableGroups = new LinkedList<>();
    // Broadcast table configuration
    private Collection<String> broadcastTables = new LinkedList<>();
    //Default sharding strategy configuration
    private ShardingStrategyConfiguration defaultDatabaseShardingStrategy;
    //Default table sharding strategy configuration
    private ShardingStrategyConfiguration defaultTableShardingStrategy;
    // Primary key generation strategy configuration
    private KeyGenerateStrategyConfiguration defaultKeyGenerateStrategy;
    
    private ShardingAuditStrategyConfiguration defaultAuditStrategy;
    //Default shard key
    private String defaultShardingColumn;
    //Customized sharding algorithm
    private Map<String, AlgorithmConfiguration> shardingAlgorithms = new LinkedHashMap<>();
    // Primary key generation algorithm
    private Map<String, AlgorithmConfiguration> keyGenerators = new LinkedHashMap<>();
    
    private Map<String, AlgorithmConfiguration> auditors = new LinkedHashMap<>();
}

After checking the real SQL log printed by the console, we found that when using ShardingSphere for data insertion, its internal implementation will first query whether the record exists based on the sharding key order_id. If the record does not exist, an insert operation is performed; if the record already exists, an update operation is performed. It seems that only 10 insert SQL statements will be executed, but in fact 20 SQL statements need to be executed, which will have a certain impact on the performance of the database.

The function is quite simple, but due to the API changes in different versions of ShardingSphere

Default data source

Some friends may have questions. The t_order table that has set up sharding rules can operate data normally. If our t_user table does not have sharding rules configured, So what happens when performing an insert operation?

After carefully reading the official technical documentation, I have actually answered my friend’s question. If only some databases are divided into databases and tables, do I need to configure the tables that are not divided into databases and tables in the sharding rules? Official answer: No.

We create a t_user table and do not configure any sharding rules on it. In my impression, if default-data-source-name is not set as the default data source, an error should be reported when operating an unsharded table!

We tried to insert a piece of data into t_user, and it succeeded? After looking through the database table, I found that the data was only inserted into the db1 database, indicating that the broadcast route was not used.

shardingsphere-jdbc Version 5.x removed the original default data source configuration and automatically used the rules of the default data source. To verify that I added more data sources, I tentatively adjusted Insert data again in the order of db2, db0, and db1. This time the record is inserted into the db2 library. After repeated trials, the data is inserted again. get conclusion.

Unsharded tables will use the first data source as the default data source by default, which is the first datasource.names one.

spring:
  shardingsphere:
    #Data source configuration
    datasource:
      # Data source name, multiple data sources separated by commas
      names: db2, db1, db0

Shardingsphere gave a brief introduction and quickly implemented the sub-database and sub-table functions using yml and Java coding. Next, we will implement them one by one according to the functions of the mind map at the beginning of the article.