elasticsearch data migration elasticdump

Directory of series articles

Chapter 1 es cluster construction
Chapter 2 Basic Operation Commands of ES Cluster
Chapter 3 es implements encryption authentication based on search-guard plug-in
Chapter 4 es commonly used plug-ins

Article directory

  • Table of Contents of Series Articles
  • Preface
  • 1. What is elasticdump?
  • 2. Install the elasticdump tool
    • 1.Offline installation
    • 2. Online installation
  • 3. Elasticdump related parameters
  • 4. Use elasticdump for data backup
  • 5. Use elasticdump for data recovery

Foreword

In the actual production environment of an enterprise, it is inevitable to migrate, data backup and restore the ES cluster to ensure the availability and integrity of the data. Therefore, it involves data backup and recovery. This chapter mainly focuses on the elasticdump tool to explain backup operations and recovery operations. The rest of the backup and recovery methods are not covered for the time being, and the summary comes from actual production.

Migration methods Usage scenarios
logstash Migrate full amount? orincremental data, and scenarios that do not have high real-time requirements require simple filtering of the migrated data through es query. Scenarios that require complex filtering of the migrated data. Or handle data migration scenarios with a large span of scenario versions, such as migrating version 5.x to version 6.x or version 7.x
elasticdump Scenarios with small data volume

1. What is elasticdump?

Elasticdump is an open source and free command-line tool for importing and exporting Elasticsearch data. It provides a convenient way to transfer data between different Elasticsearch instances, or perform data backup and recovery.

Using Elasticdump, you can export data from an Elasticsearch index to a JSON file, or import data from a JSON file into an Elasticsearch index. It supports a variety of options and filters for specifying sources and destinations, including index modes, document types, query filters, and more.

Key features include:
Supports data transfer and backup between Elasticsearch instances or clusters. Data can be copied from one cluster to another.
Supports data transmission in different formats, including JSON, NDJSON, CSV, backup files, etc.
Can be used via the command line or programmatically. The command line mode provides a convenient operation interface.
Supports incremental synchronization and only copies documents that do not exist in the target cluster.
Supports various authentication methods to connect to Elasticsearch, such as basic auth, Amazon IAM, etc.
Supports multi-threaded operations, which can speed up data migration.
Open source and free, the code is hosted on GitHub.

insufficient:
The elasticdump tool is suitable for backing up a single index or the entire es cluster index that is not too large. If the es cluster has a large amount of data, you need to use the logstash method for migration recovery.

2. Install elasticdump tool

1.Offline installation

Background:
The elasticdump tool depends on the node environment
1. Install node offline (the server is a pure intranet environment)
Download node installation package
https://nodejs.org/dist/v16.14.0/node-v16.14.0-linux-x64.tar.xz
Upload to server
rz node-v16.14.0-linux-x64.tar.xz
Unzip and install
tar xf node-v16.14.0-linux-x64.tar.xz -C /usr/local/
mv /usr/local/node-v16.14.0-linux-x64 /usr/local/node
    Configure environment variables and reload
    vim /etc/profile
export NODE_HOME=/usr/local/node
export PATH=$NODE_HOME/bin:$PATH
source /etc/profile
    Verify installation was successful
    node -v #If the installation is successful, this command will return the version information corresponding to node.
    npm -v
    2. Install elasticdump offline
    Download the corresponding toolkit locally
    https://github.com/elasticsearch-dump/elasticsearch-dump/archive/v6.19.0.tar.gz
    Upload to server
    rz v6.19.0.tar.gz
    Unzip and install
    tar xf v6.19.0.tar.gz -C /export/server/
    Check if the installation is successful
    elasticdump --version

2. Online installation

Background:
The elasticdump tool depends on the node environment
1. Install node online (the server can connect to the external network)
#Download node installation package
wget https://nodejs.org/dist/v16.14.0/node-v16.14.0-linux-x64.tar.xz
#Extract and install
tar xf node-v16.14.0-linux-x64.tar.xz -C /usr/local/
mv /usr/local/node-v16.14.0-linux-x64 /usr/local/node
     #Configure environment variables and reload
    vim /etc/profile
export NODE_HOME=/usr/local/node
export PATH=$NODE_HOME/bin:$PATH
source /etc/profile
     #Verify whether the installation is successful
    node -v #If the installation is successful, this command will return the version information corresponding to node.
    npm -v
     #Or directly use the yum command to install the node environment
     yum -y install nodejs npm
    2. Install elasticdump online
    #Set npm source otherwise the installation will be very slow
npm config set registry https://registry.npm.taobao.org/
#Global installation
    npm install elasticdump -g
    #Check if the installation is successful
    elasticdump --version

3. elasticdump related parameters

 #Related parameters can be viewed using elasticdump --help
--input: Specify the source location of the input Elasticsearch instance or JSON file;
 --output: Specify the output destination Elasticsearch instance or JSON file;
--type: Specify the data type to be operated on, including index, alias, template, data analyzers, etc.;
--searchBody: When the input is an Elasticsearch instance, specify a JSON object as a query parameter;
--limit: Limit the number of exported documents. The normal import and export is 100 pieces of data at a time, which is very time-consuming if there is a large amount of data. Limit is a parameter that limits the size and can be adjusted according to needs.
--inputIndex: Specify the index name of the input source Elasticsearch instance;
--ignore-errors: Ignore the documents with errors and continue exporting the remaining documents.
--scrollTime: Set the scroll time in milliseconds. The default is 10 minutes.
--timeout: Set the request timeout in milliseconds. The default is 30 seconds.
--support-big-int supports large number types
--big-int-fields specifies supported fields, the default is ''(default '')
--bulk-1imit: Set the limit of the number of documents in batch operations to 1000 by default

4. Use elasticdump for data backup

 1. Data backup: Back up a single index
When the es cluster is configured with a word segmenter, you first need to export the word segmentation
#Pay special attention when exporting the word segmenter. We can only export a single one based on the index, not all. Exporting all will cause an error that the index does not exist:
/export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input http://username:password@ip address:9200/index name --output /export/word segmentation file name.json --type=analyzer( Specify exported word segmentation) --limit=10000 (limit size)
\t\t
Export mapping structure
/export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input http://username:password@ip address:9200/index name --output /export/mapping file name.json --type=mapping( Specify the export structure) --limit=10000 #Export structure
\t\t
\t\texport data
/export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input http://username:password@ip address:9200/index name --output /export/data file name.json --type=data( Export data) --limit=10000 #Export data
 2. Data backup: Back up multiple indexes (provided that the total index data does not exceed 1G)
View all index sizes as shown below
curl -X GET http://ip:9200/_cat/indices?v? -uusername:password
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open test1 xxx 1 1 100104 0 508mb 508mb
index: index name
docs.count: Total number of documents in the index
store.size: The disk space occupied by the index
pri.store.size: The amount of disk space occupied by the primary shard

Use the awk command to get the index name and save it to the unidom.txt file
curl -X GET http://ip:9200/_cat/indices?v? -uusername:password | grep -v "status" |awk '{print $3}' > unidom.txt

Write a script file to perform multi-index backup. The detailed script is as follows:
#!/bin/bash
for item in `cat /root/unidom.txt`
do
/export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input http://icos:[email protected]:9200/$item --output /export/unidom/analyzer/${item}_analyzer.json --type=analyzer
/export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input http://icos:[email protected]:9200/$item --output /export/unidom/mapping/${item}_mapping.json --type=mapping
/export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input http://icos:[email protected]:9200/$item --output /export/unidom/data/${item}_data.json --type=data
done

Execute the above script and output the corresponding script execution log
nohup ./script.sh >>backup.log 2> & amp;1 & amp; #When executing script script.sh, both error output 2 and standard output 1 will be imported into the backup.log file in additional writing mode.
When the script is finally executed, check whether there is the word "success" at the end of the log, so that the backup is successful.

5. Use elasticdump for data recovery

 1. Check whether the index name to be imported exists in the new es cluster. If it does not exist, create the index name first.
curl -X PUT http://user:password@IP address:9200/index name -uuser:password
 2. Import participles first
#!/bin/bash
#ls /export/cityos-oss/analyzer |awk -F'_' '{print $1}' The command mainly takes out the file name of the backup file ending with json
for item in `ls /export/backup_es/analyzer |awk -F'_' '{print $1}'`
do
/export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input /export/backup_es/analyzer/${item}_analyzer.json --output http://user:password@IPAddress:9200/$item - -type=analyzer
done
Execute the above script and output the corresponding script execution log
nohup ./analyzer_in.sh >>/analyzer.log 2> & amp;1 & amp; #When executing script script.sh, both error output 2 and standard output 1 will be imported into the /analyzer.log file in additional writing mode.
 3. Import mapping structure
#!/bin/bash
for item in `ls /export/backup_es/mapping |awk -F'_' '{print $1}'`
do
/export/server/elasticdump/elasticsearch-dump/bin/elasticdump --input /export/backup_es/mapping/${item}_mapping.json --output http://user:password@IPAddress:9200/$item - -type=mapping
done
Execute the above script and output the corresponding script execution log
nohup ./mapping_in.sh >>mapping.log 2> & amp;1 & amp; #When executing script script.sh, both error output 2 and standard output 1 will be imported into the mapping.log file in additional writing mode.
 4. Import data
#!/bin/bash
for item in `ls /export/backup_es/data |awk -F'_' '{print $1}'`
do
/export/server/elasticdump/elasticsearch-dump/bin/elasticdump--input /export/backup_es/data/${item}_data.json --output http://user:password@IPAddress:9200/$item - -type=data
done
Execute the above script and output the corresponding script execution log
nohup ./data_in.sh >>data.log 2> & amp;1 & amp; #When executing script script.sh, both error output 2 and standard output 1 will be imported into the data.log file in additional writing mode.
 5. Verify the size after the import is completed.
curl -X GET http://ip:9200/_cat/indices?v? -uusername:password
Check the index name, the total number of documents in the index, the disk space occupied by the index, and whether the disk space occupied by the primary shard is consistent with the index size found in the source es cluster.