[ElasticSearch Series-06] Construction of Es cluster architecture and core concepts of clusters

ElasticSearch series overall column

Content	Link address
[1] ElasticSearch download and installation	https://zhenghuisheng.blog.csdn.net/article/details/129260827
[2] ElasticSearch concepts and basic operations	https://blog.csdn. net/zhenghuishengq/article/details/134121631
[3] ElasticSearch’s advanced query Query DSL	https://blog.csdn.net/zhenghuishengq/article/details/134159587
[4] Aggregation query operation of ElasticSearch	https://blog.csdn.net/zhenghuishengq/article/details/134159587
[5] SpringBoot integrates elasticSearch	https://blog.csdn.net/zhenghuishengq/article/details/134212200
[6] The construction of Es cluster architecture and the core concepts of clusters	https: //blog.csdn.net/zhenghuishengq/article/details/134258577

The construction of Es cluster architecture and the core concepts of clusters

1. In-depth understanding of the underlying principles of es cluster architecture
- 1. Core concept of cluster
- - 1.1, nodes and node types
  - 1.2, Request and response process
  - 1.3, sharding
- 2. Cluster construction
- - 2.1, es cluster construction
  - 2.2, kibana installation
- 3. X-pack safety certification
- 4. Node node type
- - 4.1, Configuration of different nodes
  - 4.2. Benefits of single responsibility

1. In-depth understanding of the underlying principles of es cluster architecture

The previous article explained the installation and basic use of es. The following article mainly explains the underlying principles of es’s cluster architecture, es’s index sharding, replicas and other basic knowledge.

1. Core concepts of clusters

Before installing the cluster, let’s first understand several concepts of the cluster. As shown in the figure below, it is an es cluster composed of three nodes. p0, p1, and p2 represent the shards in a node, and R0, R1, and R2 represent the replicas corresponding to the shards.

1.1, nodes and node types

There can be one or more nodes in a cluster. Each node is an instance of Es, which is essentially a java process. Generally on a machine, it is recommended to run an instance of ElasticSearch. In the Es cluster, there are multiple node types, mainly including the following nodes. When building an es cluster, you need to set different parameters according to different node types. After each node is started, it will be a Master eligible node that can participate in the election by default.

Master Node: Master node, as shown in the figure above, Node1 is the master node. It is mainly responsible for the creation and deletion of some indexes, deciding which node to assign shards to, maintaining updates to the entire cluster, etc.
Master eligible nodes: Qualified nodes that can participate in the election. When the master node hangs up, the node can participate in the election. This node is also a slave node of the Master node.
Data Node: A node specially used to store data. For example, data inserted by an index is stored in this node. The master node is responsible for how to distribute the shards to the data nodes, and use the data nodes to solve the horizontal expansion of data and solve the single point problem of data.
Coordinating Node: Coordinating node, used to receive and respond to requests, complete the reception and distribution of data

1.2, request and response process

A simple es cluster architecture is as follows. When a client requests, it will first be received and distributed by this coordination node, and the request will be implemented to the Data Node data node, or the Master node. If the data is queried, it can be directly distributed to the Data data node. If it is addition, deletion or modification, which needs to involve changes in the cluster, shards or replicas, it can be directly distributed to the Master node.

After querying the data from the node’s shards, the data will be summarized on the Coordinating coordination node, and after some statistical calculations, etc., the results will be returned.

If es wants to separate reading and writing to increase high performance, you can add this Ingest Node. This node is called a pre-processing conversion node and supports pipeline settings. You can use this Ingest node to filter and convert data.

1.3, Sharding

When creating an index, you can specify the number of index shards, the number of copies, etc. Sharding can be divided into primary sharding and replica sharding.

Primary Shard: Primary Shard is mainly used to solve the problem of horizontal expansion of data. Through the primary shard, data can be distributed to all nodes in the cluster. Each shard is a Lucene In one example, shards are not allowed to be modified after they are created, because obtaining data requires a hash modulus operation, and changing the number will directly affect the result.

Replica Shard: Replica Shard is used to solve the problem of high data availability. It is a copy of the primary shard. The number of primary shards is not allowed to be modified after creation. The number of replica shards is allowed to be modified. Modified, and to some extent, the performance of the service reading data can be improved by increasing the number of replicas. However, it is best to set the replica sharding to 0 or 1. If it is log data, it can be directly set to 0. If it is retrieval data such as product information, it can be directly set to 1.

As shown below, when creating an index, you can set the number of shards and set the number of copies.

PUT /zhs_db
{<!-- -->
  "settings": {<!-- -->
    "number_of_shards": 3, //Set the number of shards
    "number_of_replicas": 1 //Set the number of replicas
  }
}

The number of shards is related to the number of nodes. If there is a cluster composed of three nodes, then if there are three shards, then according to the default hash algorithm, there will be one shard in each node, which will be as shown in the figure below. The situation of P0, P1, P2 and R0, R1, R2

In the case of non-standalone machines, the replica shards are generally not on the same node as the primary shards. The replica shards are usually on different nodes in the same cluster. The specific node on which they are located needs to pass the Master node. to allocate

It is not advisable to set too many shards on a single node, because too many shards will affect the results of the score calculation and waste a lot of resources

2, cluster construction

2.1, es cluster construction

In the first article, we explained how to build an es single-node cluster. To build an es cluster, you need three machines. You can just repeat the single-node building, and they can all be built simply through docker.

The main thing is to modify the contents of the file in elasticsearch.yml. The most important thing is to modify the host numbers of the three servers that have built es nodes in discovery.seed_hosts for node discovery. , the previous single-machine setting only had the host number of the current node.

When specifying the name of this node, the node.name set in this yml file must also be inconsistent. The first node is set to node-1, the second node is set to node-2, and the third node is set to node-3. During initialization When clustering nodes, you need to configure these three values in the cluster.initial_master_nodes attribute

# The three nodes of the specified cluster name must be consistent
cluster.name: docker-cluster
#Specify the node name, each node name is unique
node.name: node-1
#Whether it is qualified to be the master node, the default is true
node.master: true
#Whether it is a data node, the default is true
node.data: true
# Bind ip and enable remote access. You can configure 0.0.0.0
network.host: 0.0.0.0
#Specify web port
#http.port: 9200
#Specify tcp port
#transport.tcp.port: 9300
#Used for node discovery, host numbers of three nodes
discovery.seed_hosts: ["xxx.xxx.xxx.166", "xxx.xxx.xxx.167", "xxx.xxx.xxx.168"]
The newly introduced configuration item in #7.0, initial arbitration, is only required when the entire cluster is started for the first time.
#This option is configured as the value of node.name, specifying the name of the cluster node that can be initialized.
cluster.initial_master_nodes: ["node-1","node-2","node-3"]
#Solve cross-domain issues
http.cors.enabled: true
http.cors.allow-origin: "*"

2.2, kibana installation

This is also explained in detail in the first installation, but in kibana.yml, some configurations also need to be modified as follows. The host number and port number of the server where es is located need to be configured.

server.port: 5601
server.host: "xxx.xxx.xxx.166"
elasticsearch.hosts: ["http://xxx.xxx.xxx.166:9200","http://xxx.xxx.xxx.167:9200","http://xxx.xxx.xxx.168: 9200"]
i18n.locale: "zh-CN"

3, X-pack security certification

In order to solve the problem of data security and prevent the possibility of data being captured, it is necessary to create a security authentication for the nodes on each machine. Here we choose to implement it through this x-pack method.

First enter the container of each machine, such as this 166 machine

docker exec -it elasticsearch /bin/bash

Then execute the following command directly, and then a prompt will appear, requiring you to enter the password twice.

//Create a certificate for the cluster
elasticsearch-certutil ca

Continue to execute the following command

//Generate certificates and private keys for nodes in the cluster
elasticsearch-certutil cert --ca elastic-stack-ca.p12

After successful execution, authorize the two files in the es folder.

chmod 777 elastic-certificates.p12

Then modify the configuration in elasticsearch.yml on each machine. On the original basis, add the following parameters

## elasticsearch.yml configuration
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.client_authentication: required
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12

After adding the above security authentication, then add the configuration to enable xpack security authentication, which is still the elasticsearch.yml in each service.

xpack.security.enabled: true # Enable xpack authentication mechanism

Then restart the es service. After the restart is complete, enter the es again.

docker exec -it elasticsearch /bin/bash

After entering es, enter the command again

elasticsearch-setup-passwords interactive

After entering it, you can find that you can set some passwords for es, kibana, logstach and other system systems, and you can create passwords manually.

Enter password for [elastic]:
Reenter password for [elastic]:
Enter password for [apm_system]:
passwords must be at least [6] characters long
Try again.
Enter password for [apm_system]:
Reenter password for [apm_system]:
Enter password for [kibana_system]:
Reenter password for [kibana_system]:
Enter password for [logstash_system]:
Reenter password for [logstash_system]:
Enter password for [beats_system]:
Reenter password for [beats_system]:
Enter password for [remote_monitoring_user]:
Reenter password for [remote_monitoring_user]:
1234Changed password for user [apm_system]
56Changed password for user [kibana_system]
Changed password for user [kibana]
Changed password for user [logstash_system]
Changed password for user [beats_system]
Changed password for user [remote_monitoring_user]
Changed password for user [elastic]

For example, to set a password for kibana, directly enter kibana, then open the kibana.yml configuration, and then modify the yml file to add the following account password

elasticsearch.username: "kibana"
elasticsearch.password: "123456"

After restarting the kibana, you can find that after opening the kibana, you need to enter the account password configured above.

4, Node node type

4.1, configuration of different nodes

As mentioned above, the types of nodes include Master node, Master eligible slave node, Coordinaing coordination node, Data data node and other nodes. Each node is a java process, and initially it is a slave node that can participate in the election. Then when configuring, you need to set different nodes through different parameters.

The parameters for setting the node type are as follows. You need to set the corresponding nodes through these attributes.

The way to set it as the master node is as follows. Only one node.master needs to be true, and all others need to be set to false.

node.master: true node.ingest: false node.data: false

The method of setting it as a data node is as follows. Only the value of node.data needs to be true, and the other values are set to false.

node.master: false node.ingest: false node.data: true

The method of setting it as an ingest node is as follows. You only need node.ingest to be true, and the others are set to false.

node.master: false node.ingest: false node.data: true

The coordination node can directly set all three values to false. If no coordination node is set, then the node that receives the request will be regarded as the coordination node by default.

node.master: false node.ingest: false node.data: false

4.2, Benefits of Single Responsibility

As mentioned above, the same process can achieve different functions through different parameter settings and achieve a single responsibility through different roles, thus increasing the high availability of the entire ElasticSearch.

For example, the single-responsibility Master node is mainly used for index and shard management, such as creation, deletion, etc., which affects the entire cluster data. Therefore, in actual development, you can choose low-configuration CPU, RAM processor and Disk etc.
For example, this default slave node that can participate in the election is mainly responsible for the status management of the cluster. When the master node is down, it participates in the election and is called the master node. In actual development, you can also choose low-configuration CPU and RAM processing. drives and disks, etc.
For example, this large Data node is responsible for data processing and solving problems such as horizontal expansion. It can use high-configuration CPUs, RAM processors and disks.
For example, this ingest node is mainly responsible for data processing, so you can also use this high-configuration CPU, medium-configuration RAM processor and low-configuration disk.
This coordinator node is mainly responsible for receiving and forwarding data, and finally needs to calculate and summarize the queried data, so it requires a high-configuration CPU, a high-configuration RAM processor, and a low-configuration disk.

When the system has a large number of complex queries, the query performance can be increased by increasing the number of coordinator nodes.

When the disk capacity cannot meet the demand or the pressure of reading and writing is relatively high, data nodes can be added.