elasticsearch data migration logstash

Directory of series articles

Chapter 1 es cluster construction
Chapter 2 Basic Operation Commands of ES Cluster
Chapter 3 es implements encryption authentication based on search-guard plug-in
Chapter 4 es commonly used plug-ins
Chapter 5 es data migration elasticdump

Article directory

Table of Contents of Series Articles
Preface
1. What is logstash?
2. Full data migration steps
- 1. Install logstash
- 2. Modify logstash configuration
- 3. Create a logstash file for full migration
- 4. Execute the migration command and check the results
3. Data incremental migration steps
- 1. Create incremental migration files
- 2. Start the incremental migration and check whether the incremental migration is successful
Summarize

Foreword

Through the content of Chapter 5, we learned that the elasticdump tool is only suitable for situations where the amount of es data is small and there are not many or many indexes, that is, in most cases it is used to back up a single index. However, in the actual production environment, the entire es cluster data is migrated a lot, so elasticdump is not suitable. If used forcefully, it will cause the server disk IO and CPU to be too high, and it is easy to generate alarms. Therefore, this article recommends it to everyone. Logstash, a migration tool suitable for production environments.

1. What is logstash?

Logstash is an open source data collection engine with real-time pipeline capabilities. Logstash can dynamically unify data from disparate sources and normalize data to a target output of your choice. It provides a large number of plugins that help us parse, enrich, transform and buffer any type of data

2. Full data migration steps

Note: The premise is that the source es must be connected to the target es network

1. Install logstash

 Download the appropriate LogStash installation package and unzip it for installation
    wget https://artifacts.elastic.co/downloads/logstash/logstash-7.10.0-linux-x86_64.tar.gz
tar -zvxf logstash-7.10.0-linux-x86_64.tar.gz

2. Modify logstash configuration

 Modify the heap memory usage of Logstash
    vi config/jvm.options, modify the Logstash configuration file config/jvm.options, and add -Xms2g and -Xmx2g.
    
    Modifying the number of records written in Logstash batches can speed up the migration efficiency of cluster data.
    viconfig/pipelines.yml
    pipeline.batch.size changed from 125 to 5000

3. Create a logstash file for full migration

 Configure logstash
Go to the installation directory
cd /export/server/logstash/confing/
Create vi es2es_all.conf file
input {<!-- -->
elasticsearch {<!-- -->
hosts => "http://ip:9200" ##Source es cluster
user => "username" ##Authentication information
password => "password"
index => "index name" ##? Supports wildcards, * table? All indexes, if the index has multiple data volumes, it can be configured separately
query => '{ "sort": [ "_doc" ] }'
slices => 4 ##Whether to use ?slice scroll to speed up migration, the value should not exceed the number of single index shards
scroll => "5m" ##scroll session retention time
size => 1000
docinfo => true
ssl => false ##Do you want to use ssl?
}
}
filter {<!-- -->
  # Remove some fields added by Logstash itself.
mutate {<!-- -->
remove_field => ["@timestamp", "@version"]
}
}
output {<!-- -->
elasticsearch {<!-- -->
hosts => "http://ip:9200" ##Purpose es cluster
user => "username"
password => "password"
index => "index name" #Just keep it consistent with the source es index
#index => "%{[@metadata][_index]}" #Fill in the peer information based on the original information
document_type => "%{[@metadata][_type]}" #Target end index type, the following configuration indicates that the index type is consistent with the source end
document_id => "%{[@metadata][_id]}" #The id of the target data. If you do not need to retain the original id, you can delete the following line. After deletion, the performance will be better.
ssl => false #Close ssl
ssl_certificate_verification => false
ilm_enabled => false
 manage_template => false
}
}

4. Execute the migration command and check the results

 Start the full Logstash migration task
nohup bin/logstash -f config/es2es_all.conf >es_all.log 2> & amp;1 & amp;
\t
Check the es_all.log log to see if there is a migration error. If not, execute the following command to check whether the size of the source index and the target index after migration are consistent.
curl -X GET http://ip:9200/_cat/indices?v?

3. Incremental data migration steps

Note: The premise is that the source es must be connected to the target es network

1. Create incremental migration files

1. Create incremental migration files
To install logstash and adjust the configuration, see the Step 2’ above.
\tillustrate:
The configuration parameters of Logstash in version 8.5 have been adjusted, and document_type => "%{[@metadata][_type]}" needs to be removed.
After modifying the Logstash configuration file according to the following script, enable the Logstash scheduled task to trigger incremental migration.
vim logstash/config/es_add.conf
input{<!-- -->
    elasticsearch{<!-- -->
        # Source ES address.
        hosts => ["http://localhost:9200"]
        # Configure the login user name and password for the secure cluster.
        user => "xxxxxx"
        password => "xxxxxx"
        # List of indexes that need to be migrated. Use commas (,) to separate multiple indexes.
        index => "kibana_sample_data_logs"
        # Query incremental data by time range. The following configuration indicates querying the data of the last 5 minutes.
        query => '{"query":{"range":{"@timestamp":{"gte":"now-5m","lte":" now/m"}}}}'
        # Scheduled tasks, the following configuration indicates execution once every minute.
        schedule => "* * * * *"
        scroll => "5m"
        docinfo=>true
        size => 5000
    }
}
filter {<!-- -->
  # Remove some fields added by Logstash itself.
  mutate {<!-- -->
    remove_field => ["@timestamp", "@version"]
  }
}
output{<!-- -->
    elasticsearch{<!-- -->
        # The target ES address can be obtained from the basic information page of the Alibaba Cloud Elasticsearch instance.
        hosts => ["http://ip:9200"]
        # Secure cluster configuration login user name and password.
        user => "elastic"
        password => "xxxxxx"
        # Target index name. The following configuration indicates that the index is consistent with the source index.
        index => "%{[@metadata][_index]}"
        # Target index type. The following configuration indicates that the index type is consistent with the source index type.
        document_type => "%{[@metadata][_type]}"
        # The ID of the target data. If you do not need to retain the original ID, you can delete the following line. After deletion, the performance will be better.
        document_id => "%{[@metadata][_id]}"
        ilm_enabled => false
        manage_template => false
    }
}

2. Start incremental migration and check whether the incremental migration is successful

Execute the following command
nohup bin/logstash -f config/es_add.conf >es_add.log 2> & amp;1 & amp;
\t
Check if successful
curl -XGET http://localhost:9200/kibana_sample_data_logs/_search
{<!-- -->
"query": {<!-- -->
"range": {<!-- -->
"@timestamp": {<!-- -->
"gte": "now-5m",
"lte": "now/m"
}
}
},
"sort": [
{<!-- -->
"@timestamp": {<!-- -->
"order": "desc"
}
}
]
}
Check whether the returned result contains the word "success". If it does, the incremental migration is successful.

Summary

The most important point of the logstash migration method shared this time is that the source es and target es networks must be connected to each other. If they are not, this method will fail to migrate.