es contains both “query” and “filter” clauses in the search

Elasticsearch supports many query methods, one of which is DSL, which writes the request in JSON and then performs related queries.

1. Query DSL and Filter DSL

There are two types of DSL query languages: query DSL (query DSL) and filter DSL (filter DSL).

The difference between the two is as follows:

query DSL

In query context, the query answers the question – “Does this document match this query, is it relevant?”

How to verify the matching is easy to understand, how to calculate the correlation? The data indexed in ES will store a _score score, and the higher the score, the better the match. In addition, the calculation of the score of a certain search is still very complicated, so it takes a certain amount of time.

The query context is the execution environment when using query to query, such as when using search.

Some query scenarios:

  • Best match to full text search
  • Contains the word run, if it contains these words: runs, running, jog, sprint, it is also considered to contain the word run
  • Contains quick, brown, fox. The closer the words are, the more relevant the document is

filter DSL

In the context of a filter, the query answers the question – “Does match this document?”

The answer is simple, yes or no. It doesn’t calculate any score, and doesn’t care about the sorting of the return, so the efficiency will be higher.

The filtering context is the execution environment when using the filter parameter, such as using Must_not or filter in a bool query

In addition, if filters are often used, ES will automatically cache the contents of the filters, which will improve a lot of performance for queries.

Some filtering cases:

  • Is the creation date between 2013-2014?
  • Is the status field published?
  • Is the lat_lon field within 10km of a certain coordinate?

2. The difference between filtered and filter in es

1. bool and filtered

1.1 Description
After the es 5.0 update, the filtered query will be replaced by bool query.

filtered is an older version of the syntax. Now it has been replaced by bool. It is recommended to use bool.

https://www.elastic.co/guide/en/elasticsearch/reference/5.0/query-dsl-filtered-query.html

1.2 Example usage
The old version of the wording, using the old version of the wording in es8 reports an error.

GET_search
{
  "query": {
    "filtered": {
      "query": {
        "match": {
          "text": "quick brown fox"
        }
      },
      "filter": {
        "term": {
          "status": "published"
        }
      }
    }
  }
}

The new version of the wording

GET_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "text": "quick brown fox"
        }
      },
      "filter": {
        "term": {
          "status": "published"
        }
      }
    }
  }
}

2. Two usages of filter

nested under bool

{
    "query":{
        "bool":{
            "must":{
                "term":{
                    "term":{
                        "title":"kitchen3"
                    }
                }
            },
            "filter":{
                "term":{
                    "price": 1000
                }
            }
        }
    }
}

In the root directory use

{
    "query":{
        "term":{
            "title":"kitchen3"
        }
    },
    "filter":{
        "term":{
            "price": 1000
        }
    }
}

the difference

3. Advanced search keywords

(Filter DSL part)

1.term filter

term is mainly used to precisely match which values, such as numbers, dates, boolean values or not_analyzed strings (unanalyzed text data types):

{ "term": { "age": 26 }}
{ "term": { "date": "2014-09-01" }}
{ "term": { "public": true }}
{ "term": { "tag": "full_text" }}

For a complete example, the hostname field is completely matched to the data of saaap.wangpos.com:

{
  "query": {
    "term": {
      "hostname": "saaap.wangpos.com"
    }
  }
}

2.terms filter

terms are somewhat similar to terms, but terms allow multiple matching conditions to be specified. If a field specifies multiple values, the documents need to be matched together:

{
    "terms": {
        "tag": [ "search", "full_text", "nosql" ]
        }
}

For a complete example, all http statuses are 302 and 304. Since the status in ES is a numeric field, we can directly write it like this here. :

{
  "query": {
    "terms": {
      "status": [
        304,
        302
      ]
    }
  }
}

3.range filter

Range filtering allows us to find a batch of data according to a specified range:

{
    "range": {
        "age": {
            "gte": 20,
            "lt": 30
        }
    }
}

Range operators include:

gt:: greater than
gte:: greater than or equal to
lt:: less than
lte:: less than or equal to
A complete example, requesting data that takes more than 1 second on the page, upstream_response_time is the time spent in the nginx log, and it is a numeric type in ES.

{
  "query": {
    "range": {
      "upstream_response_time": {
        "gt": 1
      }
    }
  }
}

4.exists and missing filter

The exists and missing filters can be used to find out whether a document contains a specified field or does not have a certain field, similar to the IS_NULL condition in an SQL statement.

{
    "exists": {
        "field": "title"
    }
} 

These two filters are only used when a batch of data has been detected, but when you want to distinguish whether a certain field exists.

5.bool filter

bool filter can be Boolean logic used to combine multiple filter condition query results, it contains the following operators:

must :: Exact match of multiple query conditions, equivalent to and.
must_not :: The opposite match of multiple query conditions, equivalent to not.
should :: at least one query condition matches, equivalent to or.

These parameters can inherit a filter condition or an array of filter conditions respectively:

{
    "bool": {
        "must": { "term": { "folder": "inbox" }},
        "must_not": { "term": { "tag": "spam" }},
        "should": [
                    { "term": { "starred": true }},
                    { "term": { "unread": true }}
        ]
    }
}

(Query DSL part)

1.match_all query

All documents can be queried, which is the default statement without query conditions.

{
    "match_all": {}
}

This query is often used to combine filter conditions. For example, if you need to retrieve all mailboxes, all documents have the same relevance, so the obtained _score is 1.

2.match query

The match query is a standard query, whether you need a full-text query or a exact query, you will basically use it.

If you use match to query a full-text field, it will use the analyzer to analyze the match query characters before actually querying:

{
    "match": {
        "tweet": "About Search"
    }
}

If you specify an exact value with match, it will search for the given value for you when it encounters a number, date, boolean or not_analyzed string:

{ "match": { "age": 26 }}
{ "match": { "date": "2014-09-01" }}
{ "match": { "public": true }}
{ "match": { "tag": "full_text" }}

Tip: When doing an exact match search, you’d better use a filter statement, because the filter statement can cache data.

The match query can only search for a specified exact value of an exact field, and all you have to do is to specify the correct field name for it to avoid syntax errors.

3.multi_match query

The multi_match query allows you to search multiple fields at the same time based on the match query, and check one of the multiple fields at the same time:

{
    "multi_match": {
        "query": "full text search",
        "fields": [ "title", "body" ]
    }
}

4.bool query

bool query is similar to bool filter and is used to combine multiple query clauses. The difference is that bool filtering can directly show whether the match is successful, while bool query needs to calculate the _score (correlation score) of each query clause.

must:: query specifies that the document must be included.
must_not:: query specifies that the document must not be included.
should:: query the specified document, if there is one, it can add points for document relevance.

  • must: must match each subquery, similar to “and”
  • should: Selective matching subquery, similar to “or”
  • must_not: must not match, does not participate in scoring, similar to “not”
  • filter: must match, do not participate in scoring

The following query will find “how to make millions” in the title field and the “tag” field is not marked as spam. If there is a tagged “starred” or a publication date before 2014, then these matching documents will be ranked higher than similar sites:

{
    "bool": {
        "must": { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag": "spam" }},
        "should": [
            { "match": { "tag": "starred" }},
            { "range": { "date": { "gte": "2014-01-01" }}}
        ]
    }
}

Tip: If there is no must clause under the bool query, there should at least be a should clause. But if there is a must clause, then queries can be made without the should clause.

The above content comes from: http://es.xiaoleilu.com/054_Query_DSL/70_Important_clauses.html

ElasticSearch query (match and term)
http://www.cnblogs.com/yjf512/p/4897294.html

5.wildcards query

Use standard shell wildcard queries

Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html

The following query matches documents containing W1F 7HW and W2F 8HW:

GET /my_index/address/_search

{
    "query": {
        "wildcard": {
            "postcode": "W?F*HW"
        }
    }
}

For another example, the following query hostname matches the following shell wildcards:

{
  "query": {
    "wildcard": {
      "hostname": "wxopen*"
    }
  }
}

6.regexp query

Let’s say you only want to match zip codes that start with a W followed by a number. Using regexp queries allows you to write more complex patterns:

GET /my_index/address/_search
{
    "query": {
        "regexp": {
            "postcode": "W[0-9]. + "
        }
    }
}

This regular expression stipulates that the entry needs to start with W, followed by a number from 0 to 9, and then one or more other characters.

The following example is all regular expressions starting with wxopen

{
  "query": {
    "regexp": {
      "hostname": "wxopen.*"
    }
  }
}

Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html

7. prefix query

To start with any character, you can use prefix more simply, as in the following example:

{
  "query": {
    "prefix": {
      "hostname": "wxopen"
    }
  }
}

Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html

For more query commands, you can see: https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html#term-level-queries

8. Phrase Matching

When you need to find adjacent words, you use the match_phrase query:

GET /my_index/my_type/_search
{
    "query": {
        "match_phrase": {
            "title": "quick brown fox"
        }
    }
}

Similar to the match query, the match_phrase query first parses the query string to produce a list of terms. Then all terms are searched, but only documents containing all search terms are kept, and the positions of the terms are contiguous. A query for the phrase quick fox will not match
None of our documents, since no document contains quick and box terms adjacent to each other.
The match_phrase query can also be written as a match query of type phrase:

"match": {
    "title": {
        "query": "quick brown fox",
        "type": "phrase"
    }
}

Reference: https://blog.csdn.net/kingmax54212008/article/details/105169016/

https://blog.csdn.net/weixin_39723544/article/details/103676958

https://blog.csdn.net/lucky_ly/article/details/116855624