Directory of series articles
Chapter 1 es cluster construction
Chapter 2 Basic Operation Commands of ES Cluster
Chapter 3 es implements encryption authentication based on search-guard plug-in
Chapter 4 es commonly used plug-ins
Article directory
- Table of Contents of Series Articles
- Preface
- 1. What is elasticdump?
- 2. Install the elasticdump tool
-
- 1.Offline installation
- 2. Online installation
- 3. Elasticdump related parameters
- 4. Use elasticdump for data backup
- 5. Use elasticdump for data recovery
Foreword
In the actual production environment of an enterprise, it is inevitable to migrate, data backup and restore the ES cluster to ensure the availability and integrity of the data. Therefore, it involves data backup and recovery. This chapter mainly focuses on the elasticdump tool to explain backup operations and recovery operations. The rest of the backup and recovery methods are not covered for the time being, and the summary comes from actual production.
Migration methods | Usage scenarios |
---|---|
logstash | Migrate full amount? orincremental data, and scenarios that do not have high real-time requirements require simple filtering of the migrated data through es query. Scenarios that require complex filtering of the migrated data. Or handle data migration scenarios with a large span of scenario versions, such as migrating version 5.x to version 6.x or version 7.x |
elasticdump | Scenarios with small data volume |
1. What is elasticdump?
Elasticdump is an open source and free command-line tool for importing and exporting Elasticsearch data. It provides a convenient way to transfer data between different Elasticsearch instances, or perform data backup and recovery. Using Elasticdump, you can export data from an Elasticsearch index to a JSON file, or import data from a JSON file into an Elasticsearch index. It supports a variety of options and filters for specifying sources and destinations, including index modes, document types, query filters, and more. Key features include: Supports data transfer and backup between Elasticsearch instances or clusters. Data can be copied from one cluster to another. Supports data transmission in different formats, including JSON, NDJSON, CSV, backup files, etc. Can be used via the command line or programmatically. The command line mode provides a convenient operation interface. Supports incremental synchronization and only copies documents that do not exist in the target cluster. Supports various authentication methods to connect to Elasticsearch, such as basic auth, Amazon IAM, etc. Supports multi-threaded operations, which can speed up data migration. Open source and free, the code is hosted on GitHub. insufficient: The elasticdump tool is suitable for backing up a single index or the entire es cluster index that is not too large. If the es cluster has a large amount of data, you need to use the logstash method for migration recovery.
2. Install elasticdump tool
1.Offline installation
Background: The elasticdump tool depends on the node environment 1. Install node offline (the server is a pure intranet environment) Download node installation package https://nodejs.org/dist/v16.14.0/node-v16.14.0-linux-x64.tar.xz Upload to server rz node-v16.14.0-linux-x64.tar.xz Unzip and install tar xf node-v16.14.0-linux-x64.tar.xz -C /usr/local/ mv /usr/local/node-v16.14.0-linux-x64 /usr/local/node Configure environment variables and reload vim /etc/profile export NODE_HOME=/usr/local/node export PATH=$NODE_HOME/bin:$PATH source /etc/profile Verify installation was successful node -v #If the installation is successful, this command will return the version information corresponding to node. npm -v 2. Install elasticdump offline Download the corresponding toolkit locally https://github.com/elasticsearch-dump/elasticsearch-dump/archive/v6.19.0.tar.gz Upload to server rz v6.19.0.tar.gz Unzip and install tar xf v6.19.0.tar.gz -C /export/server/ Check if the installation is successful elasticdump --version
2. Online installation
Background: The elasticdump tool depends on the node environment 1. Install node online (the server can connect to the external network) #Download node installation package wget https://nodejs.org/dist/v16.14.0/node-v16.14.0-linux-x64.tar.xz #Extract and install tar xf node-v16.14.0-linux-x64.tar.xz -C /usr/local/ mv /usr/local/node-v16.14.0-linux-x64 /usr/local/node #Configure environment variables and reload vim /etc/profile export NODE_HOME=/usr/local/node export PATH=$NODE_HOME/bin:$PATH source /etc/profile #Verify whether the installation is successful node -v #If the installation is successful, this command will return the version information corresponding to node. npm -v #Or directly use the yum command to install the node environment yum -y install nodejs npm 2. Install elasticdump online #Set npm source otherwise the installation will be very slow npm config set registry https://registry.npm.taobao.org/ #Global installation npm install elasticdump -g #Check if the installation is successful elasticdump --version
3. elasticdump related parameters
#Related parameters can be viewed using elasticdump --help --input: Specify the source location of the input Elasticsearch instance or JSON file; --output: Specify the output destination Elasticsearch instance or JSON file; --type: Specify the data type to be operated on, including index, alias, template, data analyzers, etc.; --searchBody: When the input is an Elasticsearch instance, specify a JSON object as a query parameter; --limit: Limit the number of exported documents. The normal import and export is 100 pieces of data at a time, which is very time-consuming if there is a large amount of data. Limit is a parameter that limits the size and can be adjusted according to needs. --inputIndex: Specify the index name of the input source Elasticsearch instance; --ignore-errors: Ignore the documents with errors and continue exporting the remaining documents. --scrollTime: Set the scroll time in milliseconds. The default is 10 minutes. --timeout: Set the request timeout in milliseconds. The default is 30 seconds. --support-big-int supports large number types --big-int-fields specifies supported fields, the default is ''(default '') --bulk-1imit: Set the limit of the number of documents in batch operations to 1000 by default
4. Use elasticdump for data backup
1. Data backup: Back up a single index When the es cluster is configured with a word segmenter, you first need to export the word segmentation #Pay special attention when exporting the word segmenter. We can only export a single one based on the index, not all. Exporting all will cause an error that the index does not exist: /export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input http://username:password@ip address:9200/index name --output /export/word segmentation file name.json --type=analyzer( Specify exported word segmentation) --limit=10000 (limit size) \t\t Export mapping structure /export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input http://username:password@ip address:9200/index name --output /export/mapping file name.json --type=mapping( Specify the export structure) --limit=10000 #Export structure \t\t \t\texport data /export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input http://username:password@ip address:9200/index name --output /export/data file name.json --type=data( Export data) --limit=10000 #Export data
2. Data backup: Back up multiple indexes (provided that the total index data does not exceed 1G) View all index sizes as shown below curl -X GET http://ip:9200/_cat/indices?v? -uusername:password health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open test1 xxx 1 1 100104 0 508mb 508mb index: index name docs.count: Total number of documents in the index store.size: The disk space occupied by the index pri.store.size: The amount of disk space occupied by the primary shard Use the awk command to get the index name and save it to the unidom.txt file curl -X GET http://ip:9200/_cat/indices?v? -uusername:password | grep -v "status" |awk '{print $3}' > unidom.txt Write a script file to perform multi-index backup. The detailed script is as follows: #!/bin/bash for item in `cat /root/unidom.txt` do /export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input http://icos:[email protected]:9200/$item --output /export/unidom/analyzer/${item}_analyzer.json --type=analyzer /export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input http://icos:[email protected]:9200/$item --output /export/unidom/mapping/${item}_mapping.json --type=mapping /export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input http://icos:[email protected]:9200/$item --output /export/unidom/data/${item}_data.json --type=data done Execute the above script and output the corresponding script execution log nohup ./script.sh >>backup.log 2> & amp;1 & amp; #When executing script script.sh, both error output 2 and standard output 1 will be imported into the backup.log file in additional writing mode. When the script is finally executed, check whether there is the word "success" at the end of the log, so that the backup is successful.
5. Use elasticdump for data recovery
1. Check whether the index name to be imported exists in the new es cluster. If it does not exist, create the index name first. curl -X PUT http://user:password@IP address:9200/index name -uuser:password
2. Import participles first #!/bin/bash #ls /export/cityos-oss/analyzer |awk -F'_' '{print $1}' The command mainly takes out the file name of the backup file ending with json for item in `ls /export/backup_es/analyzer |awk -F'_' '{print $1}'` do /export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input /export/backup_es/analyzer/${item}_analyzer.json --output http://user:password@IPAddress:9200/$item - -type=analyzer done Execute the above script and output the corresponding script execution log nohup ./analyzer_in.sh >>/analyzer.log 2> & amp;1 & amp; #When executing script script.sh, both error output 2 and standard output 1 will be imported into the /analyzer.log file in additional writing mode.
3. Import mapping structure #!/bin/bash for item in `ls /export/backup_es/mapping |awk -F'_' '{print $1}'` do /export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input /export/backup_es/mapping/${item}_mapping.json --output http://user:password@IPAddress:9200/$item - -type=mapping done Execute the above script and output the corresponding script execution log nohup ./mapping_in.sh >>mapping.log 2> & amp;1 & amp; #When executing script script.sh, both error output 2 and standard output 1 will be imported into the mapping.log file in additional writing mode.
4. Import data #!/bin/bash for item in `ls /export/backup_es/data |awk -F'_' '{print $1}'` do /export/server/elasticdump/elasticsearch-dump/bin/elasticdump--input /export/backup_es/data/${item}_data.json --output http://user:password@IPAddress:9200/$item - -type=data done Execute the above script and output the corresponding script execution log nohup ./data_in.sh >>data.log 2> & amp;1 & amp; #When executing script script.sh, both error output 2 and standard output 1 will be imported into the data.log file in additional writing mode.
5. Verify the size after the import is completed. curl -X GET http://ip:9200/_cat/indices?v? -uusername:password Check the index name, the total number of documents in the index, the disk space occupied by the index, and whether the disk space occupied by the primary shard is consistent with the index size found in the source es cluster.