Canal implements Mysql and ES data synchronization

1. ES overview

The full name of ES is “ElasticSewrch”, which is an open source distributed search engine based on RESTful web interface and built on top of Apache Lucene.

ES can store, search and analyze a large amount of data in a very short time. Usually used as the core engine in complex search scenarios.

To put it simply, ES can help us quickly complete the search of massive data.

2. Scene introduction

In actual development, in order to ensure data security and persistence, we generally store data in a database, such as a mysql database.

However, if the amount of data is relatively large and there are many search services, we need to use ES to realize our search function.

And the search implemented by ES needs to put the data into ES before it can be realized. Therefore, we need to synchronize the data in mysql to ES.

3. Implementation plan

To achieve data synchronization between Mysql and ES, it can be roughly divided into three schemes:

Solution 1: Synchronous call

the

Solution 2: Asynchronous notification

the

Solution 3: Monitor binlog

the

4. Monitor binlog

Among the above three implementation schemes, the third scheme “monitoring binlog” is relatively more reliable,

| Solution | Advantages | Disadvantages |

| ———- | —————————— | ——- ———————————- |

| Synchronous call | Simple and rude implementation | High degree of business coupling |

| Asynchronous notification | Low coupling, generally difficult to implement | Rely on the reliability of mq |

| Monitor binlog | Completely decoupling between services, with strong reliability | Enabling binlog increases database burden and high implementation complexity |

4.1 Implementation principle

Binlog is actually Binary Log, which is MySQL binary log, also called update log (Update Log), which is a very important log in MySQL.

It is mainly used to record changes in the database, that is, DDL and DML statements of SQL statements, excluding data record query operations.

BinaryLog is usually used for master-slave replication of databases. Its working principle is as follows:

the

– 1) MySQL master writes data changes into the binary log (binary log), and the recorded data is called binary log events

– 2) MySQL slave copies the binary log events of the master to its relay log (relay log)

– 3)MySQL slave replays the events in the relay log and reflects the data changes to its own data

4.2 Implementation scheme – Canal

Canal is an open source project under Alibaba, developed based on Java. Canal is to monitor the binary log of the database to provide incremental data subscription & consumption.

GitHub address: https://github.com/alibaba/canal

Canal is to disguise itself as a slave node of MySQL to monitor changes in the binary log of the master.

Canal will notify the Canal client of the obtained change information, and then complete the synchronization with other databases.

the

5. Build the environment

5.1 Open MySQL master-slave

① Modify the configuration file

Find the mysql configuration file my.cnf, and add configuration information on it

log-bin=/var/lib/mysql/mysql-bin
binlog-do-db=heima

Interpretation of configuration:

– `log-bin=/var/lib/mysql/mysql-bin`: Set the storage address and file name of the binary log file, called mysql-bin

– `binlog-do-db=heima`: Specify which database to record binary log events, here record the library heima

Take Mysql in the Docker container as an example, the final configuration:

[mysqld]
skip-name-resolve
character_set_server=utf8
datadir=/var/lib/mysql
server-id=1000
log-bin=/var/lib/mysql/mysql-bin
binlog-do-db=heima

5.2 Set user permissions

Add an account that is only used for data synchronization. For security reasons, only the operation authority for the heima library is provided here.

 createuser canal@'%' IDENTIFIED by'canal';
GRANTSELECT,REPLICATION SLAVE,REPLICATION CLIENT,SUPER ON*.*TO'canal'@'%' identified by'canal';
FLUSH PRIVILEGES;

5.3 Install Canal

Take the docker container as an example

docker run -p 11111:11111 --name canal \
-e canal.destinations=heima \
-e canal.instance.master.address=mysql:3306 \
-e canal.instance.dbUsername=canal\
-e canal.instance.dbPassword=canal \
-e canal.instance.connectionCharset=UTF-8 \
-e canal.instance.tsdb.enable=true \
-e canal.instance.gtidon=false \
-e canal.instance.filter.regex=heima\..* \
-d canal/canal-server:v1.1.5

illustrate:

– `-p 11111:11111`: This is the default listening port of canal

– `-e canal.destinations=heima`: the name of the canal cluster, which must be consistent with the cluster name set in the canal service

– `-e canal.instance.master.address=192.168.136.133:3306`: database address and port

– `-e canal.instance.dbUsername=canal`: database user name

– `-e canal.instance.dbPassword=canal`: database password

– `-e canal.instance.tsdb.enable=true`: indicates whether to enable the tsdb switch

– `-e canal.instance.gtidon=false`: indicates whether it is GTID mode

– `-e canal.instance.filter.regex=`: the name of the table to monitor

Syntax supported by table name monitoring:

“`

Tables concerned by mysql data parsing, Perl regular expressions.

Multiple regular expressions are separated by commas (,), and escape characters need double slashes (\)

Common examples:

1. All tables: .* or .*\..*

2. All tables under the canal schema: canal\..*

3. The table starting with canal under canal: canal\.canal.*

4. A table under the canal schema: canal.test1

5. Combine multiple rules and separate them with commas: canal\..*, mysql.test1, mysql.test2

“`

5.4 Canal client service

① Client overview

Canal provides clients in various languages. When Canal detects binlog changes, it will notify the Canal client.

We can use the Java client provided by Canal to monitor Canal notification messages. When a change message is received, the data update is completed.

Of course, there are many Canal clients on the market. Let’s take the third-party open source canal-starter client on GitHub and combine it with SpringBoot as an example to demonstrate.

② Introduce dependencies

<dependency><groupId>top.javatool</groupId><artifactId>canal-spring-boot-starter</artifactId><version>1.2.1-RELEASE</version></dependency>

③ Write configuration

canal:
  destination: heima # The cluster name of canal, which should be consistent with the name set when installing canal
  server: 192.168.136.135:11111 # canal service address

④ Entity class

packagecom.itheima.domain;importlombok.Data;@DatapublicclassHotel{privateLong id;privateString name;privateString address;privateInteger price;privateInteger score;privateString brand;privateString city;privateString starName;privateString business;privateString longitude;privateString latitude pic;}

⑤ Write a listener

> Write a listener by implementing the `EntryHandler` interface to listen to Canal messages.

>

> Pay attention to two points:

>

> – The implementation class specifies the monitored table information through `@CanalTable(“tb_hotel”)`

> – The generic type of EntryHandler is the entity class corresponding to the table

>

packagecom.itheima.handle;importcom.itheima.domain.Hotel;importorg.springframework.stereotype.Component;importtop.javatool.canal.client.annotation.CanalTable;importtop.javatool.canal.client.handler.EntryHandler;@ CanalTable("tb_hotel")@ComponentpublicclassHotelHandler implementsEntryHandler<Hotel>{/**
     * Automatically execute when new data is added in mysql
     * @paramhotel added data
     */@Overridepublicvoidinsert(Hotel hotel){//Add the newly added data hotel to ES}/**
     * Automatically execute when the data in mysql is modified
     * @parambefore data before modification
     * @paramafter Modified data
     */@Overridepublicvoidupdate(Hotel before,Hotel after){//Update the modified data to ES}/**
     * Automatically execute when data in ysql is deleted
     * @paramhotel The data to delete
     */@Overridepublicvoiddelete(Hotel hotel){//Delete the data hotel to be deleted from ES}}

6. Supplement

This mode can not only realize data synchronization between mysql and es. It is applicable to any data synchronization accepted from mysql to other.