MongoDB cluster construction

1. Introduction to MongoDB cluster

1. Replica Set (preferred) replica set. The cluster contains multiple copies of data. After the master node hangs up, the slave node can continue to provide services.

2. Sharding sharding cluster, only real big data can show its advantages. After all, it takes time for nodes to synchronize data. Sharding can centralize multiple pieces of data to routing nodes for some comparison, and then return the data, which is slightly less efficient.

3. Master-Slaver active and standby clusters are officially no longer recommended.

2. Working principle of Replica Set

1. Replica Set is called a replica set. The cluster contains multiple copies of data. It ensures that if the primary node hangs up, the backup node can continue to provide data services. The prerequisite is that the data needs to be consistent with the primary node.

2. Mongodb (M) represents the primary node, Mongodb (S) represents the standby node, and Mongodb (A) represents the arbitration node. The active and standby nodes store data, and the quorum node does not store data. The client connects to the primary node and the backup node at the same time, but does not connect to the quorum node.

3. Under the default settings, the primary node provides all additions, deletions, and modification services, and the backup node does not provide any services. However, the backup node can be configured to provide query services, which can reduce the pressure on the primary node. When the client performs data query, the request is automatically transferred to the backup node. This setting is called Read Preference Modes, and the Java client provides a simple configuration method, which eliminates the need to directly operate the database.

4. The arbitration node is a special node that does not store data itself. Its main function is to determine which standby node will be promoted to the master node after the master node fails, so the client does not need to connect to this node. Although there is only one standby node here, an arbitration node is still needed to increase the standby node level.

3. Replica Set cluster construction

Primary node[Primary]

Receive all write requests and then synchronize the modifications to all Secondaries. A Replica Set can only have one Primary node. When the Primary node fails, other Secondary or Arbiter nodes will re-elect a primary node. By default, read requests are also sent to the primary node for processing. If they need to be forwarded to the secondary node, the client needs to modify the connection configuration.

Replica node [Secondary]

Maintain the same data set as the primary node. When the master node hangs up, participate in the master election.

Arbiter[Arbiter]

No data is kept, no participation in the election of the leader, only voting for the leader. Using Arbiter can reduce the hardware requirements for data storage. Arbiter has almost no major hardware resource requirements to run, but the important point is that it and other data nodes should not be deployed on the same machine in a production environment.

Note that the number of nodes in an automatic failover Replica Set must be an odd number. The purpose is that there must be a majority when voting for the leader to make the leader election decision.

1. First prepare three test machines:

192.168.1.30:27017 master node (master)

192.168.1.31:27017 Standby node (slave)

192.168.1.32:27017 Arbitration point (arbiter)

2. Download the installation package from the official website

MongoDB version naming convention such as: x.y.z;
When y is an odd number, it means that the current version is a development version, such as: 1.5.2, 4.1.13;
When y is an even number, it means that the current version is a stable version, such as: 1.6.3, 4.0.10;
z is the revision number, the bigger the number, the better.

4. Construction of Mongo sharded high-availability cluster

Overview

In order to solve the problem that the database on each slave node of mongodb in the replica set is a full copy of the database, the slave node pressure is a great challenge in the scenario of high concurrency and large data volume. At the same time, taking into account the later mongodb cluster when the data pressure is huge The scalability leads to the sharding mechanism to deal with massive data.

What is sharding

Sharding is the process of splitting a database and spreading it across different machines. It can store more data and handle larger loads without the need for powerful servers. Cut the collection into smaller pieces out of the total data. , these blocks are dispersed into several slices, each slice only loads a part of the total data, and is operated through a routing process of the mongos component that knows the correspondence between the data and the slices.

Basic components

It uses four components: mongos, config server, shard, replica set

mongos

The entry point for database cluster requests. All requests need to be coordinated through mongos. There is no need to use programs at the application level for routing selection. Mongos itself is a request distribution center, responsible for distributing external requests to the corresponding shard server. Mongos serves as a unified The request entry, in order to prevent the failure of a single mongos node, generally requires HA (highly available, abbreviation for Highly Available).

config server

Configuration server, which stores the configuration of all database metadata (sharding, routing). Mongos itself does not physically store shard server and data routing information. It just caches the data in memory. When mongos is started for the first time or restarted later, the configuration information will be loaded from the config server. If the configuration server information is updated, All mongos will be notified to update their status to ensure accurate request routing. Multiple config servers are usually required in a production environment to prevent single node loss of configuration files.

shard

In the traditional sense, if there is a massive amount of data, the pressure of storing 1T on a single server is very high. Considering the database hard disk, network IO, CPU, and memory bottlenecks, if multiple servers share 1T of data, each server will have to store 1T of data. It is measurable smaller data. As long as the sharding rules are set in the mongodb cluster and the database is operated through mongos, the corresponding operation request can be automatically forwarded to the corresponding back-end sharding server.

replica set

In the overall mongodb cluster architecture, if a single machine of the corresponding shard node goes offline, part of the data corresponding to the entire cluster will be missing. This cannot happen. Therefore, a replica set is required for the shard node to ensure the reliability of the data. , the production environment is usually 2 replicas + 1 quorum.

Overall structure

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly

Server	Virtual Machine 1	Virtual Machine 2	Virtual Machine 3
Component 1	mongos	mongos	mongos
Component 2	config server	config server	config server
Component 3	shard server1 (master)	shard server1 (slave)	shard server1 (quorum)
Component 4	shard server2 (quorum)	shard server2 (master)	shard server2 (slave)
Component 5	shard server3 (slave)	shard server3 (quorum)	shard server3 (master)

Port allocation

Replica Set 1	Replica Set 2	Replica Episode 3
mongos—-20001	mongos—-20001	mongos—-20001
config–21001	config–21001	config–21001
shard1 (master)–27011	shard1 (master)–27011	shard1 (master)–27011
shard1 (from)–27012	shard1 (from)–27012	shard1 (from)–27012
shard1 (arbitration)–27013	shard1 (arbitration)–27013	shard1 (arbitration)–27013

Download the installation package and unzip it: Download address: https://www.mongodb.com/try/download/community
The version used this time is 6.0.6
Unzip the file and place it in the /usr/local/mongodb/mongodb-6.0.6 directory