elasticsearch7.15 creates snapshots and backups

1. Description

Snapshots are backups taken from a running Elasticsearch cluster. You can take a snapshot of an entire cluster, including all its data streams and indexes. You can also snapshot only specific data streams or indexes in your cluster.
Before you can create a snapshot, you must first register the snapshot repository.
Snapshots can be stored in a local repository or in a remote repository. Remote repositories can reside on Amazon S3, HDFS, Microsoft Azure, Google Cloud Storage, and other platforms supported by the repository plugin.

Elasticsearch takes snapshots incrementally: the snapshot process only copies data that has not been copied by earlier snapshots into the repository, avoiding unnecessary work or duplication of storage space. This means you can safely take frequent snapshots with minimal overhead. This increment only works for a single repository because no data is shared between repositories. Snapshots are also logically independent of each other, even within a single repository: deleting a snapshot does not affect the integrity of any other snapshot.
Snapshots can be restored to a running cluster, including all data streams and indexes in the snapshot by default. However, you can choose to restore only the cluster state or specific data streams or indexes from the snapshot.
Snapshot lifecycle management enables automatic generation and management of snapshots.

2. Register repository

If you register the same snapshot repository with multiple clusters, only one cluster has write access to the repository. All other clusters connected to this repository should set the repository to read-only mode

The snapshot format may change between different major versions, so if a cluster on a different version attempts to write to the same repository, a snapshot written by one version may not be visible to another version, and the repository May be damaged. While setting the repository to be read-only for all but one cluster should work for multiple clusters differing by one major version, this configuration is not supported

  • First you need to set up a shared file system (nfs)

    Reference centos7: centos7nfs configuration shared folder
    Set folder elasticsearch user permissions:

    chown elasticsearch:elasticsearch-R /data/backups/
    
  • Modify the local storage path of elasticsearch.yml:
    After the configuration is completed, the node needs to be restarted.

    path:
      repo:
        -/data/backups
    
  • Create repository via api:

    PUT /_snapshot/my_backup
    {<!-- -->
      "type": "fs",
      "settings": {<!-- -->
        "location": "/data/backups"
      }
    }
    
  • Query repository information:

    GET /_snapshot/my_backup
    

3. Create a snapshot

PUT /_snapshot/my_backup/snapshot_2?wait_for_completion=true
{<!-- -->
  "indices": "data_stream_1,index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": false,
  "metadata": {<!-- -->
    "taken_by": "kimchy",
    "taken_because": "backup before upgrading"
  }
}
  • The wait_for_completion parameter specifies whether the request should return immediately after the snapshot is initialized (the default), or wait for the snapshot to complete. During snapshot initialization, information about all previous snapshots is loaded into memory, which means that on large repositories, even if the wait_for_completion parameter is set to false, the request may take several seconds (or even minutes) to return.

  • ignore_unavailable:
    Setting this to true will cause non-existing data streams and indexes to be ignored when creating snapshots. By default, if the ignore_unavailable option is not set and the data stream or index is missing, the snapshot request will fail

  • Set include_global_state to false to prevent cluster global state from being stored as part of the snapshot.
    Global cluster state includes the cluster’s index templates, such as those that match data flows. If your snapshot contains data streams, we recommend storing global state as part of the snapshot. This allows you to later restore any templates required for the data flow.
    By default, if one or more indexes participating in a snapshot do not have all primary shards available, the entire snapshot will fail. You can change this behavior by setting partial to true. The expand_wildcards option can be used to control whether hidden and closed indexes are included in the snapshot. The default is open and hidden indexes.
    Use the metadata fields to attach arbitrary metadata to the snapshot, such as who took the snapshot, why it was taken, or any other data that might be useful.

  • Snapshot names can be automatically derived using date mathematical expressions, similar to when creating a new index. Special characters must be URI encoded.

    PUT /_snapshot/my_backup/<snapshot-{<!-- -->now/d}>
    PUT /_snapshot/my_backup/<snapshot-{now/d}>
    

    Date expression: https://www.elastic.co/guide/en/elasticsearch/reference/7.15/date-math-index-names.html

4. Restore snapshot

POST /_snapshot/my_backup/snapshot_2/_restore
{<!-- -->
  "indices": "data_stream_1,index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": false,
  "rename_pattern": "index_(. + )",
  "rename_replacement": "restored_index_$1",
  "include_aliases": false
}
  • If the restored index indices exist in the cluster, rename the index through rename_pattern and rename_replacement for restoration. If it does not exist, directly use the indices index name to restore it.

  • By default, include_global_state is false, which means that the cluster status and feature status of the snapshot are not restored.
    If true, the restore operation merges the legacy index templates in the cluster with the templates contained in the snapshot, replacing any existing index templates whose names match the templates in the snapshot. It completely removes all persistent settings, non-legacy index templates, ingest pipelines, and ILM lifecycle policies that exist in the cluster and replaces them with the corresponding items from the snapshot.

  • Depending on the appendReplacement logic, the rename_pattern and rename_replacement options can also be used to rename data streams and indexes on restore using regular expressions that support referencing the original text.
    If you rename a recovered data stream, its supporting indexes are also renamed. For example, if you rename the log data stream to restored-logs, the index will be backed up. ds-logs-2099.03.09-000005 will be renamed to . ds-restored-logs-2099.03.09-000005.
    If you rename a restored stream, make sure the index template matches the new stream name. If there is no index template matching the stream, it cannot roll over or create a new standby index.

  • To prevent aliases from being restored along with their associated data streams and indexes, set include_aliases to false.

5. Backup and restore using kibana

  • Enter the homepage-Backup and restore

  • Register repository



  • Create policy


    Specify retention days
  • Restore snapshot