6.ELK’s Elasticsearch nested (Nested) type

0. Foreword

In practical applications of Elasticsearch, nested documents are often encountered, and there are “requirements for object arrays to be indexed and queried independently of each other.” In ES, this kind of nested document is called a parent-child document. There are at least two ways for parent-child documents to “query independently of each other”:

1) Parent-child documents. In the 5.x version of ES, it is implemented through parent-child parent-child type, that is, one index corresponds to multiple types;

For version 6.X+, one index no longer supports multiple types, so the implementation of parent-child index has been changed to Join.

2) Nested nested type.

See the official website: Nested Objects | Elasticsearch: The Definitive Guide | Elastic

1. Overview of ES data types

1. Common types
binary: Accepts binary values as Base64-encoded strings. By default, this field is not stored, is not searchable, and cannot contain newlines \

boolean: Boolean type, can accept true or false, you can use strings and directly to Boolean type, the empty string is false, including: true, false, “true”, “false”, “”
keyword: Keyword type, no word segmentation, direct indexing, supports fuzzy and exact matching, supports aggregation and sorting operations, and is used to filter data. The maximum supported length is – 32766 UTF-8 type characters.
number: number type, document link
long
integer
short
byte
double
float
half_float
scaled_float
unsigned_long

Dates: date type
date: It can be a formatted date string or a timestamp, such as 2015-01-01, 2015-01-01T12:10:30Z, 1420070400001
date_nanos: supports nanosecond date format, which is stored as a long integer inside es
alias: alias type

2. Object and relationship types
object: object type, a json object
flattened: Store the object as a single field value
nested: Nested data type, which can be regarded as a special object type that allows independent retrieval of object arrays
join: the same document, but with a parent-child relationship, similar to a tree
3. Structured data types
range: range type, which can be used to represent the interval of data
integer_range
float_range
long_range
double_range
date_range
ip_range

2. An example to illustrate the role of nested type

(1) Nested:A nested object is a specialized version of the object data type that can index and query object arrays independently of each other.

(2) Default organization form of object array

The actual storage mechanism of the internal object field array is different from what we think. Lucene has no concept of internal objects because ElasticSearch flattens the object hierarchy into a list of field names and field values. For example, the document below.

PUT user/user_info/1
{
  "group" : "man",
  "userName" : [
    {
      "first" : "张",
      "last" : "三"
    },
    {
      "first" : "李",
      "last" : "four"
    }
  ]
}

Here we want to query the data whose first is “Zhang” and last is “四”. According to our understanding, there should be no such data. Query according to the following statement.

GET /user/user_info/_search
{
  "query":{
    "bool":{
        "must":[
            {
              "match":{
                "userName.first":"张"
              }
            },
            {
              "match":{
                "userName.last":"四"
              }
            }
         ]
    }
  }
}

The query results are as follows: The query was actually found. This obviously does not meet our expectations.

The reason for this is that Lucene has no concept of internal objects as mentioned earlier. The so-called internal objects are actually flattened into a simple list of field names and values. The internal storage of the document looks like this:

{
  "group" : "human",
  "sex" : "man",
  "userName.first" : [ "Zhang", "Li" ],
  "userName.last" : [ "三", "四" ]
}

Obviously, the userName.first and userName.last fields are flattened into multi-valued fields, the previous correlation is lost, and the query will not get the expected results.

So how do you achieve the semantics you want? –Obviously this is what this article wants to talk about.

3. Use of nested type

3.1. First insert the following record

Its meaning is blog post information data, in which the comments of each post are stored in the comments field array.

PUT /financeblogs/blog/docidart1
{
  "title": "Invest Money",
  "body": "Please start investing money as soon...",
  "tags": ["money", "invest"],
  "published_on": "18 Oct 2017",
  "comments": [
    {
      "name": "William",
      "age": 34,
      "rating": 8,
      "comment": "Nice article..",
      "commented_on": "30 Nov 2017"
    },
    {
      "name": "John",
      "age": 38,
      "rating": 9,
      "comment": "I started investing after reading this.",
      "commented_on": "25 Nov 2017"
    },
    {
      "name": "Smith",
      "age": 33,
      "rating": 7,
      "comment": "Very good post",
      "commented_on": "20 Nov 2017"
    }
  ]
}

Now the name and age of the commentator for this data are as follows.

name	age
William	34
John	38
Smith	33

3.2. Internal objects cannot be queried as expected when they are not nested

We tried to query the blogs commented by {name:John, age:34}. According to our understanding, there should be no records that meet the conditions. However, due to the tiling reasons mentioned earlier, the following query statement actually retrieves this data.

GET /financeblogs/blog/_search
{
"query":{
"bool":{
"must":[
{
"match":{
"comments.name":"John"
}
},
{
"match":{
"comments.age":"34"
}
}
]
}
}
}

3.3. Next, switch to the nested gameplay

0. Delete this index and try again

DELETE financeblogs

1. Create the following index. The main reason is that the comments field in the mapping specifies the type as nested.

PUT /financeblogs
{
  "mappings": {
    "blog": {
      "properties": {
        "title": {
          "type": "text"
        },
        "body": {
          "type": "text"
        },
        "tags": {
          "type": "keyword"
        },
        "published_on": {
          "type": "keyword"
        },
        "comments": {
          "type": "nested",
          "properties": {
            "name": {
              "type": "text"
            },
            "comment": {
              "type": "text"
            },
            "age": {
              "type": "short"
            },
            "rating": {
              "type": "short"
            },
            "commented_on": {
              "type": "text"
            }
          }
        }
      }
    }
  }
}

2. Insert the same target data

PUT /financeblogs/blog/docidart1
{
  "title": "Invest Money",
  "body": "Please start investing money as soon...",
  "tags": ["money", "invest"],
  "published_on": "18 Oct 2017",
  "comments": [
    {
      "name": "William",
      "age": 34,
      "rating": 8,
      "comment": "Nice article..",
      "commented_on": "30 Nov 2017"
    },
    {
      "name": "John",
      "age": 38,
      "rating": 9,
      "comment": "I started investing after reading this.",
      "commented_on": "25 Nov 2017"
    },
    {
      "name": "Smith",
      "age": 33,
      "rating": 7,
      "comment": "Very good post",
      "commented_on": "20 Nov 2017"
    }
  ]
}

3. Use nested query method

# Query the records whose name is John and age is 34 and it is found that there is no data.

GET /financeblogs/blog/_search?pretty
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "comments",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "comments.name": "John"
                    }
                  },
                  {
                    "match": {
                      "comments.age": 34
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

4. Query the data whose name is John and age is 38. There is some data

4. Comparison between parent-child and nested methods

items	nested )	Parent-Child Document
Advantages	High reading performance (According to official: 5~10 times faster than father and son)	Parent and child documents can be updated independently
Disadvantages	Updating child documents requires updating the entire document	Poor reading performance and high CPU usage (Additional memory required to maintain relationships)
Adapt to scenarios	Scenarios where queries are the main part and subdocuments are occasionally updated	Subdocuments are updated frequently; Subdocuments are queried frequently.

Nested documents look like they just have a collection field within the document, but the internal storage is anything but. The following figure is an example of nested documents; Message 1, Message 2, and Message 3 are actually stored internally as 4 independent documents.

At the same time, the field type of the nested document needs to be set to nested. After it is set to nested, it cannot be queried directly. You need to use nested query.

In conclusion:

1. Ordinary sub-objects implement a one-to-many relationship by default, which will lose the boundaries of the sub-object and the relevance of the sub-object attributes.

2. Nested objects can solve the problems of ordinary sub-objects, but they have two disadvantages: first, all documents must be updated when updating, and the other is that it does not support the scenario where sub-documents are subordinate to multiple main documents.

3. The parent-child document can solve the previous two existing problems, but it is suitable for scenarios where there is more writing and less reading (query efficiency is slower).

For more syntax about nested, see:

Dry information | In-depth explanation of Elasticsearch Nested type_es nest type-CSDN Blog