elasticsearch DSL query document

DSL query classification

  • Query all: Query all data, generally used for testing. For example: match_all

  • Full text search (full text) query: Use a word segmenter to segment user input content, and then match it in the inverted index database. For example:

    • match_query

    • multi_match_query

  • Exact query: Find data based on precise entry values, usually keyword, numerical, date, boolean and other types of fields. For example:

    • ids

    • range

    • term

  • Geographic (geo) query: Query based on longitude and latitude. For example:

    • geo_distance

    • geo_bounding_box

  • Compound query: Compound query can combine the above various query conditions and merge the query conditions. For example:

    • bool

    • function_score

Full text search query

Common full-text search queries include:

  • match query: single field query

  • multi_match query: multi-field query, if any field meets the conditions, it will meet the query conditions.

match query

match query: single field query

GET /hotel/_search
{
  "query": {
    "match": {
      "all": "Home Inn"
    }
  }
}

result

mulit_match

multi_match query: multi-field query, if any field meets the conditions, it will meet the query conditions.

GET /hotel/_search
{
  "query": {
    "multi_match": {
      "query": "百家",
      "fields": ["brand", "name"]
    }
  }
}

result

The results of the two queries are the same because we copied the brand, name, and business values into the all field using copy_to. Therefore, if you search based on three fields, the effect of searching based on the all field will be the same. However, the more search fields there are, the greater the impact on query performance, so it is recommended to use copy_to and then single-field query.

Accurate query

Because the field search for precise query is a field without word segmentation, the query conditions must also be not word segmentation terms. When querying, the content entered by the user will be considered qualified only if it completely matches the automatic value. If the user inputs too much content, no data can be searched.

Precise queries generally search for keyword, numerical, date, boolean and other types of fields. Therefore, the search terms are not divided into words. Common ones are:

  • term: Query based on the exact value of the term

  • range: query based on the range of values

term query

Query based on the exact value of the term

GET /hotel/_search
{
  "query": {
    "term": {
      "city": {
        "value": "Beijing"
      }
    }
  }
}

result

range query

Query based on the range of values. Here gte represents greater than or equal to, gt represents greater than, lte represents less than or equal to, lt represents less than

GET /hotel/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 10,
        "lte": 200
      }
    }
  }
}

Geographical coordinate query

The so-called geographical coordinate query is actually a query based on longitude and latitude

Rectangular range query

Rectangular range query, that is, geo_bounding_box query, queries all documents whose coordinates fall within a certain rectangular range. You need to specify the top_left: upper left and bottom_right: lower right of the rectangle. coordinates, and then draw a rectangle. All points falling within the rectangle meet the conditions.

GET /hotel/_search
{
  "query": {
    "geo_bounding_box": {
      "location": {
        "top_left": {
          "lat": 31.1,
          "lon": 121.5
        },
      "bottom_right": {
          "lat": 30.9,
          "lon": 121.7
        }
      }
    }
  }
}

Get nearby people as a result

Nearby search

Nearby query, also called distance query (geo_distance): Query all documents whose specified center point is less than a certain distance value. In other words, find a point on the map as the center of the circle, draw a circle with the specified distance as the radius, Coordinates falling within the circle are considered eligible

GET /hotel/_search
{
  "query": {
    "geo_distance": {
      "distance": "15km",
      "location": "31.21,121.5"
      }
  }
}

Obtain the result that the nearby 15km is a circle

Composite query

Compound query: Compound query can combine other simple queries to implement more complex search logic. There are two common ones:

  • Function score: score function query, which can control document relevance calculation and document ranking.

  • bool query: Boolean query, which uses logical relationships to combine multiple other queries to achieve complex searches.

Score function query

When we use match query, the document results will be scored according to the relevance to the search term (_score), and the results will be returned in descending order of the score.

The unction score query contains four parts:

  • Original query condition: query part, search documents based on this condition, and score the document based on the BM25 algorithm, original score (query score)

  • Filter conditions: filter part, only documents that meet this condition will be re-calculated.

  • Score function: Documents that meet the filter conditions must be calculated based on this function. The resulting function score (function score) has four types of functions.

    • weight: The function result is a constant

    • field_value_factor: Use a field value in the document as the function result

    • random_score: Use random numbers as function results

    • script_score: Custom score function algorithm

  • Operation mode: The result of the score function, the correlation score of the original query, and the operation method between the two, including:

    • multiply: multiply

    • replace: replace query score with function score

    • Others, such as: sum, avg, max, min

GET /hotel/_search
{
    "query": {
      "function_score": {
        "query": {
          "match": {
          "all": "百家"
          }
        },
        "functions": [
          {
            "filter": {
              "term": {
                "brand": "Home Inn"
              }
             },
             "weight": 2
            }
        ],
        "boost_mode": "sum"
      }
    }
  }

result

Boolean query

A Boolean query is a combination of one or more query clauses, and each clause is a subquery. Subqueries can be combined in the following ways:

  • must: Must match each subquery, similar to “and”

  • should: Selective matching subquery, similar to “or”

  • must_not: must not match, does not participate in scoring, similar to “not”

  • filter: must match, does not participate in scoring

GET /hotel/_search
{
  "query": {
    "bool": {
      "must": [
        {"term": {"city": "Shanghai" }}
      ],
      "should": [
        {"term": {"brand": "Crown Plaza" }},
        {"term": {"brand": "RAMADA" }}
      ],
      "must_not": [
        { "range": { "price": { "lte": 500 } }}
      ],
      "filter": [
        { "range": {"score": { "gte": 45 } }}
      ]
    }
  }
}

result

It should be noted that when searching, the more fields involved in scoring, the worse the query performance will be. Therefore, when doing this kind of multi-condition query, it is recommended to do this:

  • The keyword search in the search box is a full-text search query. Use must query to participate in score calculation.

  • For other filtering conditions, use filter query. Not involved in scoring

Search result processing

Search results can be processed or displayed in the way specified by the user

Sort

Elasticsearch defaults to sorting based on relevance score (_score), but it also supports custom ways to sort search results. Field types that can be sorted include: keyword type, numerical type, geographical coordinate type, date type, etc.

Ordinary field sorting

The syntax for sorting keyword, numeric, and date types is basically the same.

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "score": "desc"
    },
    {
      "price":"asc"
    }
  ]
}

result

Geographical coordinate sorting

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "_geo_distance" : {
          "location" : {
            "lat":31.030001,
            "lon":121.610000
          },
          "order" : "asc",
          "unit" : "km"
      }
    }
  ]
}

  • Specify a coordinate as the target point

  • Calculate the distance from the coordinates of the specified field (must be of geo_point type) to the target point in each document

  • Sort by distance

Pagination

By default, elasticsearch only returns top10 data. If you want to query more data, you need to modify the paging parameters. In elasticsearch, the paging results to be returned are controlled by modifying the from and size parameters.

  • from: From which document to start

  • size: How many documents are queried in total?

Basic paging

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0,
  "size": 10,
  "sort": [
    {"price": "asc"}
  ]
}

Result

Deep paging problem

Now, I want to query the data from 990 to 1000. The query logic should be written like this

GET /hotel/_search
{
  "query": {
   "match_all": {}
  },
  "from": 990,
  "size": 10,
  "sort": [
    {"price": "asc"}
  ]
}

When the query paging depth is large, there will be too much summary data, which will put a lot of pressure on the memory and CPU. Therefore, elasticsearch will prohibit requests from + size exceeding 10,000.

For deep paging, ES provides two solutions:

  • search after: Sorting is required during paging. The principle is to query the next page of data starting from the last sort value. Officially recommended method.

  • scroll: The principle is to form a snapshot of the sorted document ID and save it in the memory. Officially no longer recommended.

Highlight

GET /hotel/_search
{
  "query": {
    "match": {
    "all": "百家"
    }
  },
  "highlight": {
    "fields": {
      "name": {
      "pre_tags": "<em>",
        "post_tags": "</em>",
        "require_field_match": "false"
      }
    }
  }
}

result

Note:

  • Highlighting is keyword highlighting, so the search condition must contain keywords and cannot be a range query.

  • By default, the highlighted field must be consistent with the field specified in the search, otherwise it cannot be highlighted.

  • If you want to highlight non-search fields, you need to add an attribute: required_field_match=false