Sharding-JDBC sub-database sub-table

Article directory

  • 1. Sub-database and sub-table strategy
  • 2. Sharding-JDBC
    • 2.1 Add dependencies
    • 2.2 Sub-database strategy
      • 2.2.1 Database environment
      • 2.2.2 yml configuration data source
      • 2.2.3 yml configuration fragmentation algorithm
      • 2.2.4 View sub-database effect
    • 2.3 Table sub-strategy
      • 2.3.1 Database environment
      • 2.3.2 yml configuration sub-table strategy
      • 2.3.3 View sub-table effect
    • 2.4 Distributed sequence algorithm
      • 2.4.1 UUID
      • 2.4.2 Snowflake algorithm (Snowflake)
      • 2.4.3 ShardingJDBC implementation
      • 2.4.4 Check the effect
    • 2.5 Read and write separation
      • 2.5.1 Database master-slave structure
      • 2.5.2 yml configuration data source
      • 2.5.3 yml configuration read-write separation
      • 2.5.4 View the effect of reading and writing separation

Reminder: The following is the main text of this article, and the Java series will continue to be updated

1. Sub-database sub-table strategy

In a large-scale Internet system, the storage capacity of a single MySQL may not be able to meet the needs of the business, and expansion is required at this time.

As with the previous problem, there is a bottleneck in the hardware resources of a single host, and it is impossible to expand vertically without limit. At this time, we have to expand the capacity horizontally through multiple instances. We can store the data in a distributed manner so that multiple Hosts work together to store data.

Then the question is, what is the decentralized method?

  • Vertical split: Both our tables and databases can be split vertically. The so-called vertical split is to split all the tables in the database into each database according to the business function (does it feel the same? The structure of the study in the previous two chapters corresponds) and for a table, it can also be split into multiple tables through a mechanism such as a foreign key.
  • Horizontal split: The horizontal split is not for tables, but for data. We can allow many databases with the same table to store part of the data, which is equivalent to scattered data storage on each node.

Then to achieve such a split operation, we must have a relatively large workload to write the code by ourselves, so there are actually some solutions, for example, we can use MyCat (also a database middle software, which is equivalent to hanging a layer of agents, and then through MyCat to operate the database by sub-database and table, only need to connect to use). There are also ShardingSphere-Proxy or Sharding JDBC (the application program directly analyzes the SQL statement, and then converts it into a database and table operation, which requires us to write some logic code).

back to content…

2. Sharding-JDBC

Official document: https://shardingsphere.apache.org/document/5.1.0/en/overview/#shardingsphere-jdbc

Sharding-JDBC is a set of open source distributed database middleware solutions, positioned as a lightweight Java framework, additional services provided by the JDBC layer of Java, it uses the client to directly connect to the database to The service is provided in the form of a jar package without additional deployment and dependencies. It can be understood as an enhanced version of the JDBC driver, fully compatible with JDBC and various ORM frameworks.

  • Applicable to any JDBC-based ORM framework, such as: JPA, Hibernate, Mybatis, Spring JDBC Template or directly use JDBC;
  • Support any third-party database connection pool, such as: DBCP, C3P0, BoneCP, HikariCP, etc.;
  • Support any database that implements the JDBC specification, currently supports MySQL, PostgreSQL, Oracle, SQLServer and any database that can be accessed using JDBC.

2.1 Add dependencies

Here we mainly demonstrate the way of horizontal table division, we can directly create a new SpringBoot project, the dependencies are as follows:

<dependencies>
    <dependency>
        <groupId>org.apache.shardingsphere</groupId>
        <artifactId>shardingsphere-jdbc-core-spring-boot-starter</artifactId>
        <version>5.1.0</version>
    </dependency>
    <dependency>
        <groupId>org.mybatis.spring.boot</groupId>
        <artifactId>mybatis-spring-boot-starter</artifactId>
        <version>2.2.2</version>
    </dependency>
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>5.1.47</version>
    </dependency>
</dependencies>

back to content…

2.2 Sub-database strategy

2.2.1 Database environment

①We use Docker to open two MySQL containers, representing our two databases:

②Create users with external network access for the two databases respectively

CREATE USER 'wsy'@'%' IDENTIFIED BY '123456';
grant all on *.* to 'wsy'@'%';
flush privileges;

③Create the same database table structure for the two databases:

create database yyds;
CREATE TABLE test (
id int not null primary key,
name varchar(30),
passwd varchar(30)
);

back to content…

2.2.2 yml configuration data source

We are in a sub-database and sub-table state, and we need to configure two data sources:

spring:
  shardingsphere:
    datasource:
      # If there are several data, there are several, here is the name, according to the following format, in the form of name + number
      names: db0,db1
      # Configure each data source individually
      db0:
      # Data source implementation class, here use the default HikariDataSource
        type: com.zaxxer.hikari.HikariDataSource
        # database driver
        driver-class-name: com.mysql.jdbc.Driver
        # I don't need to say more
        jdbc-url: jdbc:mysql://1.15.76.95:3307/yyds
        username: wsy
        password: 123456
      db1:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.jdbc.Driver
        jdbc-url: jdbc:mysql://1.15.76.95:3308/yyds
        username: wsy
        password: 123456

Start the main class, and you can find that the connection is successful by observing the log:

back to content…

2.2.3 yml configuration fragmentation algorithm

ShardingSphere has many built-in sharding algorithms, you can check the official website: shardingsphere-jdbc sharding algorithm.

Now let’s write the configuration file. We need to tell ShardingJDBC how to shard. First of all, it is clear that both databases have test tables to store user data. Our goal is to store user information in these two databases respectively. in the table of the .

spring:
  shardingsphere:
    rules:
      sharding:
        tables:
          #Fill in the table name here, all operations on this table in the program will use the following routing scheme
          #For example, Mybatis above is to operate on the test table, so the following routing scheme will be used
          test:
            #Fill in the actual routing node here, for example, now we want to divide two libraries, then we can write the two libraries and the corresponding tables
            #You can also use expressions to abbreviate as db$->{0..1}.test
            actual-data-nodes: db0.test, db1.test
            #Here is the sub-database strategy configuration
            database-strategy:
              #Choose a standard strategy here, or configure a complex strategy to fragment based on multiple keys
              standard:
                #Fields participating in fragmentation operations, the following algorithms will operate based on the fields provided here
                sharding-column: id
                #Here fill in the name of our custom algorithm below
                sharding-algorithm-name: my-alg
        sharding-algorithms:
          #Customize a new algorithm, the name is random
          my-alg:
            #Algorithm type, there are many official built-in types, here is the simplest one
            type: MOD
            props:
              sharding-count: 2
    props:
      #Open the log, it will be convenient for us to observe in a while
      sql-show: true

Here we use MOD, that is, Modulo Fragmentation Algorithm.
?It will perform a modulo operation based on the value of the primary key. For example, if we insert id=2, then it means to perform a modulo 2 operation on the primary key. According to the name of the data source, for example, db0 is 0 after the modulus, and db1 is after the modulus is 1. In other words, the final effect is that odd numbers are placed in db1 and even numbers are placed in db0. Of course, it also supports some other algorithms, so I won’t introduce them here.

back to content…

2.2.4 View sub-database effect

We need to do some writing on the project, adding our user entity class and Mapper:

@Data
@AllArgsConstructor
public class User {<!-- -->
    int id;
    String name;
    String passwd;
}
@Mapper
public interface UserMapper {<!-- -->

    @Select("select * from test where id = #{id}")
    User getUserById(int id);

    @Insert("insert into test(id, name, passwd) values(#{id}, #{name}, #{passwd})")
    int addUser(User user);
}

① So now we write a test case to see if we can route according to our above rules:

@SpringBootTest
class ShardingJdbcStudyApplicationTests {<!-- -->
    @Resource
    private UserMapper userMapper;

    @Test
    void contextLoads() {<!-- -->
        for (int i = 1; i <= 10; i ++ ) {<!-- -->
            // Insert 1~10 users like database
            userMapper.addUser(new User(i, "xx-" + i, "123456"));
        }
    }
}

Observe the console log and find that the two databases are inserted alternately:

Check out the two databases:

② When we test query users,

@Test
void contextLoads2() {<!-- -->
    for (int i = 1; i <= 10; i ++ ) {<!-- -->
        User user = userMapper. getUserById(i);
    }
}

It is found that it is also polling to query the two databases.

In this way, we can easily realize the sub-database strategy.

back to content…

2.3 Table sub-strategy

Then let’s look at the sub-tables. For example, there are two tables test_0 and test_1 in our database. The table structure is the same, but we also hope to be able to use id The results of the modulo operation are placed in these two different tables, and the implementation ideas are actually similar.

Here we first need to introduce two table concepts:

  • Logical table: the logical name of the horizontal split database (table) with the same structure, which is the logical identifier of the table in SQL. Example: The order is split into 10 tables according to the end of the primary key, namely t_order_0 to t_order_9, and their logical table name is t_order.
  • Real table: A physical table that actually exists in a horizontally split database. That is, the ten tables from t_order_0 to t_order_9 in the previous example.

2.3.1 Database environment

Now let’s take the No. 1 database as an example, then we create the two tables mentioned above in it, and the previous test table can be deleted or not, as long as it does not exist:

create table test_0 (
  `id` int primary key,
  `name` varchar(255) NULL,
  `passwd` varchar(255) NULL
);

create table test_1 (
  `id` int primary key,
  `name` varchar(255) NULL,
  `passwd` varchar(255) NULL
);

back to content…

2.3.2 yml configuration sub-table strategy

Then we don’t need to modify any business code, what is written in Mybatis remains the same, even if our table name has changed, what we need to do is to modify the original SQL through routing, the configuration is as follows:

spring:
  shardingsphere:
    rules:
      sharding:
        tables:
          #Fill in the table name here, all operations on this table in the program will use the following routing scheme
          #For example, Mybatis above is to operate on the test table, so the following routing scheme will be used
          test:
            #Fill in the actual routing node here, for example, now we want to assign to the two tables of the db0 library
            #Can also be written separately as db0.test_0, db0.test_1
            actual-data-nodes: db0.test_$->{<!-- -->0..1}
            #Now let's configure the sub-table strategy. Note that this is table-strategy, and database-strategy is used for sub-library
            table-strategy:
              #Basically the same as before
              standard:
                sharding-column: id
                sharding-algorithm-name: my-alg
        sharding-algorithms:
          my-alg:
            #Select the algorithm type, INLINE supports us to write expressions by ourselves
            type: INLINE
            props:
              #For example, we still want to perform modulo 2 calculations to get the table where the data should go
              #Just need to give a final table name test_, the following number is calculated by the modulus of the expression
              #Actually, writing like this is exactly the same as MOD mode
              algorithm-expression: test_$->{<!-- -->id % 2}
              #Query will also be performed according to the fragmentation strategy. If we perform a range query, we will actually still perform a full query
              #INLINE algorithm does not support full query by default, so we should set it to true
              allow-range-query-with-inline-sharding: true
    props:
      #Open the log, it will be convenient for us to observe in a while
      sql-show: true

There are many built-in sharding algorithms, you can check the official website here: shardingsphere-jdbc sharding algorithm.

back to content…

2.3.3 View sub-table effect

① Let’s test it to see if we can perform sub-table insertion according to our strategy:

② Let’s look at the query again, and select the corresponding table according to the strategy we configured:

③ What happens when we perform range query?

Note: The INLINE algorithm does not support full query by default, we have to manually allow it

allow-range-query-with-inline-sharding: true

Otherwise, the following error will be reported:

We now add an SQL interface to Mapper:

@Select("select * from test where id between #{start} and #{end}")
List<User> getUsersByIdRange(int start, int end);

Let’s test it out:

@Test
void contextLoads3() {<!-- -->
    List<User> list = userMapper. getUsersByIdRange(3, 8);
}

We can find that the SQL actually executed internally is to query the two tables separately and combine the results.

In this way, we can easily implement the table splitting strategy.

back to content…

2.4 Distributed sequence algorithm

In complex distributed systems, especially microservice architectures, it is often necessary to uniquely identify a large amount of data and messages. With the complexity of the system and the increase of data, sub-databases and tables have become a common solution. After sub-databases and tables need to have a unique ID to identify a piece of data or message (such as order number, transaction flow, event number, etc.), at this time a system that can generate globally unique ID is very necessary.

For example, we have created student information tables, book borrowing tables, and library management tables before. All information will have a ID as the primary key, and this ID has the following requirements:

  1. In order to distinguish it from other data, this ID must be globally unique.
  2. The primary key should be kept in order as much as possible, which will greatly improve the query efficiency of the index.

So how do we ensure that the generation of ID meets the above requirements in a distributed system?

2.4.1 UUID

UUID is randomly composed of a group of 32-digit hexadecimal numbers. We can directly use the UUID class provided by JDK to create:

public static void main(String[] args) {<!-- -->
    String uuid = UUID. randomUUID(). toString();
    System.out.println(uuid);
}

The result is 73d5219b-dc0f-4282-ac6e-8df17bcd5860, the generation speed is very fast, it can be seen that the uniqueness can indeed be guaranteed, because it is different every time, and the probability of duplication is so long It’s really small and pitiful.

But it does not meet our second requirement above, which means that all we get are some unordered IDs.

2.4.2 Snowflake algorithm (Snowflake)

Let’s look at the snowflake algorithm, which will generate an integer ID of 64bit size, and an int will definitely not fit.

It can be seen that it is mainly composed of three parts, time + working machine ID + serial number, and the time is in milliseconds is the unit.

  • 41 bits can represent about 70 years. The time era starts at 0:00 on November 1, 2016 and can be used until 2086.
  • The working machine ID is actually the node ID. The ID of each node is different, so it can be distinguished. 10 bits can represent up to 1024 nodes.
  • The last 12 bits are the serial number under each node, so each machine can have 4096 serial numbers per millisecond.

In this way, it has both the uniqueness and order mentioned above. This is already the best solution on the market, but it still has disadvantages: the first is the time problem, if the machine time goes backwards, it will cause duplicate IDs to be generated. The second is that the node capacity is only 1024. If it is a very large-scale cluster, there are hidden dangers.

back to content…

2.4.3 ShardingJDBC implementation

Sharding-JDBC supports the above two algorithms to automatically generate IDs for us, Document: ShardingSphere-JDBC > Built-in Algorithms > Distributed Sequence Algorithms.

Let’s take ShardingJDBC’s snowflake algorithm as an example to generate a unique ID for us.

① The first is to configure the database. The ids of the test tables of our two databases are of int type, which cannot hold 64 bits, so change them to bigint type:

ALTER TABLE `yyds`.`test` MODIFY COLUMN `id` bigint NOT NULL FIRST;

② Then we need to modify the insert statement of Mybatis, because now the id is automatically generated by ShardingJDBC, no manual insertion is required:

@Insert("insert into test(name, passwd) values(#{name}, #{passwd})")
int addUser(User user);

③ Then we configure in yml: we still use sub-database strategy and custom primary key generation algorithm

spring:
  shardingsphere:
    rules:
      sharding:
        tables:
          test:
            actual-data-nodes: db0.test, db1.test
            #This is still the sub-library strategy
            database-strategy:
              standard:
                sharding-column: id
                sharding-algorithm-name: my-alg
            #Use a custom primary key generation strategy here
            key-generate-strategy:
              column: id
              key-generator-name: my-gen
        key-generators:
          #Write our custom primary key generation algorithm here
          my-gen:
            #Use the snowflake algorithm
            type: SNOWFLAKE
            props:
              #Working machine ID, make sure it is unique
              worker-id: 666
        sharding-algorithms:
          my-alg:
            type: MOD
            props:
              sharding-count: 2

back to content…

2.4.4 Check the effect

Let’s write a test case:

@Test
void contextLoads4() {<!-- -->
    for (int i = 1; i <= 20; i ++ ) {<!-- -->
        mapper.addUser(new User("aaa" + i, "123456"));
    }
}

View log: When inserting, we added an id field to our SQL statement, and used the value generated by the snowflake algorithm.

Check the database: There is no problem, the unique ID is inserted, and it is also inserted according to the sub-database strategy.

back to content…

2.5 Read and write separation

Let’s take a look at the separation of reads and writes. We have previously implemented MySQL’s master-slave replication: MySQL and distributed: master-slave replication

Then, based on the previous master-slave structure, we realize the separation of reading and writing, the main library is used as writing, and the slave library is used as reading:

2.5.1 Database master-slave structure

① We previously opened two MySQL containers in Docker, representing a master-slave structure:

Note: We need to enable the read-only mode from the library, and add the following in the MySQL configuration file:

# Only valid for ordinary users, the super user root can still insert data
read-only = 1

③ After restarting the slave library, re-build the master-slave structure, and the build is successful!

④ The master library creates the library table information that we need to operate, and the slave library also has:

create database yyds;
CREATE TABLE test (
id int not null primary key,
name varchar(30),
passwd varchar(30)
);

To summarize: specifics

main library slave library
docker container name mysql-master mysql-slave
mysql mapping port 3301 3302
mysql remote access user wsy, 123456 wsy, 123456
Library name yyds yyds
yyds table name test test
test table field (id, name, passwd) (id, name, passwd)

back to content…

2.5.2 yml configuration data source

spring:
  shardingsphere:
    datasource:
      names: master-db, slave-db
      master-db:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.jdbc.Driver
        jdbc-url: jdbc:mysql://1.15.76.95:3301/yyds
        username: wsy
        password: 123456
      slave-db:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.jdbc.Driver
        jdbc-url: jdbc:mysql://1.15.76.95:3302/yyds
        username: wsy
        password: 123456

2.5.3 yml configuration read-write separation

When there are multiple slaves, load balancing is required. The official provides many load balancing algorithms: ShardingSphere-JDBC > Built-in Algorithms > Load Balancing Algorithms

spring:
  shardingsphere:
    rules:
      #Configure read and write separation
      readwrite-splitting:
        data-sources:
          #Name whatever you want
          user-db:
            #Static types are used here, and dynamic Dynamic types can automatically discover auto-aware-data-source-name, which is not demonstrated here
            type: Static
            props:
              #Configure write library (only one)
              write-data-source-name: master-db
              # Configure slave libraries (multiple, separated by commas)
              read-data-source-names: slave-db
              #Load balancing strategy, can be customized
              load-balancer-name: my-load
        load-balancers:
          #Custom load balancing strategy
          my-load:
            type: ROUND_ROBIN
    props:
      sql-show: true

back to content…

2.5.4 Check the effect of reading and writing separation

① Write a test case: Write operation

@Test
void contextLoads5() {<!-- -->
    userMapper.addUser(new User(1, "zhangsan", "123456"));
    userMapper.addUser(new User(2, "lisi", "123456"));
    userMapper.addUser(new User(3, "wangwu", "123456"));
}

Check the console log: It is found that only data is inserted into the main library.

Check the database: Although the data is not directly inserted into the slave database, the data is still synchronized from the slave database.

② Write another test case: Read operation

@Test
void contextLoads6() {<!-- -->
    User user1 = userMapper. getUserById(1);
    User user2 = userMapper. getUserById(2);
    User user3 = userMapper. getUserById(3);
}

Check the console log: It is found that only data is read from the library.

back to content…

Summary:
Tip: Here is a summary of the article:
This article is a study of Sharding-JDBC, learning how to divide databases and tables, and understand its built-in sharding algorithm; and use the distributed sequence algorithm provided by Sharding: Snowflake algorithm; finally, use Sharding to implement the master-slave structure Database read and write separation. The following learning content will continue to be updated! ! !