Batch operation of elasticsearch index

Batch operations for indexes

Batch query and batch addition, deletion and modification

    • Batch query
GET /_mget
    • Batch write:
POST /_bulk
POST /<index>/_bulk
{"action": {"metadata"}}
{"data"}

Notice:
The bulk api has strict requirements on json syntax. Except for delete, each operation requires two json strings (metadata and business data), and each json string cannot have a new line. Different json strings must have a new line, otherwise it will Report an error;
In bulk operations, if any operation fails, it will not affect other operations, but the exception log will be told in the return result.

Index operation type

    • create: If the current data already exists when PUT data, the data will be overwritten. If the operation type create is added when PUT, if the data already exists If it exists, failure will be returned, because the operation type has been forced to be specified as create, and ES will not perform the update operation again. For example: PUT /pruduct/create/1/ (the syntax of the old version is PUT /pruduct/doc/1/_create) refers to forcibly creating data with an id of 1 in the index product. If the id If the data of 1 already exists, failure will be returned.
    • delete: Delete documents. ES uses a lazy deletion mechanism to delete documents, that is, mark deletion. (lazy delete principle)
    • index: In ES, the write operation is called Index, where Index is a verb, that is, the index data is the index that creates the data in ES, and writes the data It can also be called “index data”. It can be created or fully replaced.
    • update: perform partial update (full replacement, partial replacement)

The above four operation types are all write operations. Data writing in ES occurs in the Primary Shard. When the data is written in the Primary Shard, it will be synchronized to the corresponding Replica Shard. There are two ways to write data in ES: single data writing and batch writing. ES provides a unique API for batch writing data: _bulk. The underlying principles are introduced in detail in my “Underlying Principles of Elasticsearch”

Advantages and disadvantages

    • Advantages: Compared with ordinary Json format data operations, it does not cause additional memory consumption and has better performance. It is often used for batch writing of large amounts of data.
    • Disadvantages: Poor readability, there may be no smart prompts.
  • Batch operations of large amounts of data, such as writing data from MySQL to ES at one time, batch writing reduces the number of requests to es, reduces memory overhead and thread occupation.

Usage scenarios

#Batch query
GET product/_search
GET /_mget
{
  "docs": [
    {
      "_index": "product",
      "_id": 2
    },
    {
      "_index": "product",
      "_id": 3
    }
  ]
}

GET product/_mget
{
  "docs": [
    {
      "_id": 2
    },
    {
      "_id": 3
    }
  ]
}
#SELECT * FROM TABLE WHERE id in()
GET product/_mget
{
  "ids": [
    2,
    3,
    4
  ]
}

GET product/_mget
{
  "docs": [
    {
      "_id": 2,
      "_source": [
        "name",
        "price"
      ]
    },
    {
      "_id": 3,
      "_source": {
        "include": [
          "name",
          "price"
        ],
        "exclude": [
          "price",
          "type"
        ]
      }
    }
  ]
}

#================================================== =====
#Operation type for documents: op_type
# enum OpType {
#INDEX(0),
# CREATE(1),
#UPDATE(2),
#DELETE(3)
# }

#create:
GET test_index/_doc/1
PUT test_index/_doc/1
{
  "test_field":"test",
  "test_title":"title"
}
PUT test_index/_doc/2/_create
{
  "test_field":"test",
  "test_title":"title"
}

PUT test_index/_create/4?filter_path=items.*.error
{
  "test_field":"test",
  "test_title":"title"
}

POST test_index/_doc
{
  "test_field":"test",
  "test_title":"title"
}
#delete: Lazy deletion
DELETE test_index/_doc/3
#update:
GET test_index/_search
GET test_index/_doc/0APggnkBPdz4eXq223h8
PUT /test_index/_doc/0APggnkBPdz4eXq223h8
{
  "test_field": "test 2",
  "test_title": "title 2"
}
POST /test_index/_update/0APggnkBPdz4eXq223h8
{
  "doc": {
    "test_title": "test 3"
  }
}
#index: Can be created or replaced in full
#Create PUT test_index/_create/0APggnkBPdz4eXq223h8
#Full replacement PUT test_index/_doc/0APggnkBPdz4eXq223h8
GET test_index/_doc/0APggnkBPdz4eXq223h8
PUT /test_index/_doc/5?op_type=index & filter_path=items.*.error
{
  "test_field": "test 2",
  "test_title": "title 2",
  "test_name": "title 2"
}

#?filter_path=items.*.error

################################################ ######
#Batch additions, deletions and modifications
#POST /_bulk
#POST /<index>/_bulk
#{"action": {"metadata"}}
#{"data"}
PUT /product/_doc/1
{
    "name" : "Xiaomi phone",
    "desc" : "Fighter in mobile phone",
    "price" : 3999,
    "lv":"Flagship phone",
    "type":"mobile phone",
    "createtime":"2020-10-01T08:00:00Z",
    "tags": [ "Value for money", "Fever", "No lag" ]
}



GET product/_search

POST_reindex
{
  "source": {
    "index": "product"
  },
  "dest": {
    "index": "product2"
  }
}
GET product2/_search
GET product2/_doc/4
GET product/_doc/4
POST /_bulk
{ "create": { "_index": "product2", "_id": "2" }}
{ "name": "_bulk create 2" }
{ "create": { "_index": "product2", "_id": "12" }}
{ "name": "_bulk create 12" }
{ "index": { "_index": "product2", "_id": "3" }}
{ "name": "index product2 "}
{ "index": { "_index": "product2", "_id": "13" }}
{ "name": "index product2" }
{ "update": { "_index": "product2", "_id": "4","retry_on_conflict" : "3"} }
{ "doc" : {"test_field2" : "bulk test1"} }

#Add?filter_path=items.*.error Only display failed ones
POST /_bulk?filter_path=items.*.error
{ "delete": { "_index": "product2", "_id": "1" }}
{ "create": { "_index": "product2", "_id": "2" }}
{ "name": "_bulk create 2" }
{ "create": { "_index": "product2", "_id": "12" }}
{ "name": "_bulk create 12" }
{ "index": { "_index": "product2", "_id": "3" }}
{ "name": "index product2 " }
{ "index": { "_index": "product2", "_id": "13" }}
{ "name": "index product2" }
{ "update": { "_index": "product2", "_id": "4","retry_on_conflict" : "3"} }
{ "doc" : {"test_field2" : "bulk test1"} }