Java’s Magical ES2 (Ingenious Query and Result Processing)

SDL statement query

Basic syntax of query

GET /indexName/_search
{
  "query": {
    "query type": {
      "Query condition": "Condition value"
    }
  }
}

Query based on document id

#Query documents
GET hotel/_doc/36934

Query all

All documents under the index library will pop up // Query all
GET /indexName/_search
{
  "query": {
    "match_all": {
    }
  }
}

Full text search query (search box)

The fields participating in the search must be word-separable text type fields.

Use a word segmenter to segment user input content, and then match it in the inverted index database. For example:

- match_query
- multi_match_query
- segment the content searched by the user and obtain the terms
- Match the terms in the inverted index database to get the document ID
- Find the document based on the document ID and return it to the user
- Mall input box search
- Baidu input box search

Single field query (match query)

GET /indexName/_search
{
  "query": {
    "match": {
      "FIELD": "TEXT"
    }
  }
}

Multi-field query

Note: The field must be of text type and can be divided into word types! ! ! !

Searching for keywords, values, dates, booleans, etc. will report an error! ! !

If you put a field of exact type, an error will be reported! ! ! ! !

GET /indexName/_search
{
  "query": {
    "multi_match": {
      "query": "TEXT",
      "fields": ["FIELD1", "FIELD12"]
    }
  }
}

GET /hotel/_search
{
  "query": {
    "multi_match": {
      "query": "Yuyuan Home on the Bund, Siping, Sichuan",
      "fields": ["brand","name","business"]
    }
  }
}

Precise query

Precise queries generally search for keywords, values, dates, boolean and other types of fields (non-text). Therefore, the search terms are not divided into words.

term query (accurate query)

When querying, the content entered by the user will be considered qualified only if it completely matches the automatic value.

If you enter 1234, it will match 1234 exactly.

123, 12345, 12, 1, etc. cannot be matched.

GET /indexName/_search
{
  "query": {
    "term": {
      "FIELD": {
        "value": "VALUE"
      }
    }
  }
}

# term query
GET /hotel/_search
{
  "query": {
    "term": {
      "business": {
        "value": "Yuyuan"
      }
    }
  }
}

range query (range query (for numerical values))

Range query is generally used when performing range filtering on numerical types. For example, do price range filtering.

// range query
GET /indexName/_search
{
  "query": {
    "range": {
      "FIELD": {
        "gte": 10, // gte here means greater than or equal to, gt means greater than
        "lte": 20 // lte represents less than or equal to, lt represents less than
      }
    }
  }
}

Geographical coordinate query

The so-called geographical coordinate query is actually a query based on longitude and latitude

Rectangular range query

When querying, you need to specify the coordinates of the upper left and lower right points of the rectangle, and then draw a rectangle. All points falling within the rectangle meet the conditions.

GET hotel/_search
{
  "query":{
    "geo_bounding_box":{
      "location":{
        "top_left": {
          "lat": 31.1,
          "lon": 121.5
        },
        "bottom_right":{
          "lat": 30.9,
          "lon": 121.7
        }
      }
    }
  }
}

Nearby search

Nearby query, also called distance query (geo_distance): queries all documents whose center point is less than a certain distance value.

GET /indexName/_search
{
  "query": {
    "geo_distance": {
      "distance": "15km", // radius
      "FIELD": "31.21,121.5" // Center of circle
    }
  }
}

Compound query

Compound query: Compound query can combine other simple queries to implement more complex search logic. There are two common ones:

  • fuction score: score function query, which can control document relevance calculation and document ranking.
  • bool query: Boolean query, using logical relationships to combine multiple other queries to achieve complex searches

_scorc scoring mechanism

When we use match query, the document results will be scored according to the relevance to the search term (_score), and the results will be returned in descending order of the score.

Decisive factor: the number of times the term appears in the document.

For example: 10 entries, 5 of which are target entries, the score must be high

10 entries, 1 of which is the target entry, the score must be low

In the subsequent 5.1 version upgrade, elasticsearch improved the algorithm to the BM25 algorithm, and the formula is as follows:

Reasons for improvement:
Earlier version: The score depends on the number of occurrences of the term. The higher the number of occurrences, the higher the score
Current version: The score depends on the number of occurrences of the term. The higher the number of occurrences, the higher the score, but an upper line will be obtained according to the algorithm, which will not be particularly high

Score function query

The function score query contains four parts:

- Original query condition: query part, search documents based on this condition, and score the document based on the BM25 algorithm, original score (query score)
- Filter conditions: filter part, only documents that meet this condition will be re-calculated
- Score function: Documents that meet the filter conditions must be calculated based on this function, and the resulting function score (function score) has four types of functions.
  - weight: The function result is a constant
  - field_value_factor: Use a certain field value in the document as the function result
  - random_score: Use random numbers as function results
  - script_score: Custom score function algorithm
- Operation mode: the result of the score function, the correlation score of the original query, and the operation method between the two, including:
  - multiply: multiply
  - replace: replace query score with function score
  - Others, such as: sum, avg, max, min

The operation process of function score is as follows:

- 1) Query and search documents based on the original conditions and calculate the relevance score, which is called the original score (query score)
- 2) Filter documents according to filter conditions
- 3) Documents that meet the filtering conditions are calculated based on the score function to obtain the function score.
- 4) Compute the original score (query score) and function score (function score) based on the operation mode to obtain the final result as the correlation score.

So the key points are:

- Filter conditions: determine which documents’ scores are modified
- Score function: the algorithm that determines the score of the function
- Calculation mode: determines the final scoring result
# Calculation function query
GET hotel/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "city": "Shanghai"
        }
      },
      "functions": [
        {
          "filter": {
            "term": {
            "business": "Yuyuan"
          }
          },
          "weight": 10
        }
      ],
      "boost_mode": "replace"
    }
  }
}

Boolean query

(The more fields that are scored, the worse the query performance will be, so use filter appropriately)

A Boolean query is a combination of one or more query clauses, and each clause is a subquery. Subqueries can be combined in the following ways:

  • must: Must match each subquery, similar to “and”
  • should: Selective matching subquery, similar to “or”
  • must_not: Must not match, does not participate in scoring, similar to “non”
  • filter: must match, does not participate in scoring
- The keyword search in the search box is a full-text search query. Use must query to participate in score calculation.
- For other filtering conditions, use filter query. Not involved in scoring
GET /hotel/_search
{
  "query": {
    "bool": {
      "must": [
        {"term": {"city": "Shanghai" }}
      ],
      "should": [
        {"term": {"brand": "Crown Plaza" }},
        {"term": {"brand": "RAMADA" }}
      ],
      "must_not": [
        { "range": { "price": { "lte": 500 } }}
      ],
      "filter": [
        { "range": {"score": { "gte": 45 } }}
      ]
    }
  }
}

Sort

Keyword, value, and date types are easy to sort

textTo be tested

GET /indexName/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "FIELD": "desc" // Sorting field, sorting method ASC, DESC
    }
  ]
}

Page

Basic paging:

Basic logic:

Query 100-110 items, a total of 10 pieces of data

1: Read 100 items first

2: Read 10 more items to 110

3: Get 100-110 pieces of data, these 10 pieces of data

When there are more than 10,000 items, the efficiency is extremely low. Queries with more than 10,000 items are not supported

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0, //The starting position of paging, the default is 0
  "size": 10, // The total number of documents expected to be obtained
  "sort": [
    {"price": "asc"}
  ]
}

Deep paging

Question 1: Same as above

Question 2: An expanded version of Question 1. After clustering, if you operate the data in the cluster, you need to read the entire cluster before operating.

At this time, each node will read a large amount of data, then summarize and process it.

Node A, read 10,000 items, and fetch 10 items down.

Same for node B

Finally: 10 summary of all nodes, take the top N items. Multiple queries executed

GET hotel/_search
{
  "query": {
    "match": {
      "all": "Home on the Bund"
    }
  },
  "size": 3,
  "search_after": [379, "433576"],
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    },
    {
      "id": {
        "order": "asc"
      }
    }
  ]
}

search after: Sorting is required during paging. The principle is to query the next page of data starting from the last sort value. Officially recommended method.

Core: Value based on paging

Highlight (keyword plus label)

The implementation of highlighting is divided into two steps:

  • 1) Add a tag to all keywords in the document, such as the tag
  • 2) Write CSS styles for the tag on the page

The core of highlighting: keywords and tags

  • Highlighting is keyword highlighting, so the search condition must contain keywords and cannot be a range query.
  • By default, the highlighted field must be consistent with the field specified in the search, otherwise it cannot be highlighted
  • If you want to highlight non-search fields, you need to add an attribute: required_field_match=false
GET /hotel/_search
{
  "query": {
    "match": {
      "FIELD": "TEXT" // Query conditions, highlight must use full text search query
    }
  },
  "highlight": {
    "fields": { //Specify the fields to be highlighted
      "FIELD": {
        "pre_tags": "<em>", // Pre-tags used to mark highlighted fields
        "post_tags": "</em>" // Post tags used to mark highlighted fields
      }
    }
  }
}

JAVA client query and analysis

Query all matchAllQuery

1: Assemble the query and initiate a request()

1.1:request.source(): According to the requirements, it needs to be sorted, paginated or something else

2: Analyze layer by layer according to the structure

2.1: Analysis based on structure

2.2: The returned data is json, which can be converted to java class and other operations.

@Test
void testMatchAll() throws IOException {
    // 1. Prepare Request
    SearchRequest request = new SearchRequest("hotel");
    // 2. Prepare DSL
    request.source()
        .query(QueryBuilders.matchAllQuery());
    // 3.Send request
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4. Parse the response
    handleResponse(response);
}

private void handleResponse(SearchResponse response) {
    // 4. Parse the response
    SearchHits searchHits = response.getHits();
    // 4.1. Get the total number of items
    long total = searchHits.getTotalHits().value;
    System.out.println("Total searched" + total + "data");
    // 4.2. Document array
    SearchHit[] hits = searchHits.getHits();
    // 4.3. Traverse
    for (SearchHit hit : hits) {
        // Get document source
        String json = hit.getSourceAsString();
        //Deserialize
        HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class);
        System.out.println("hotelDoc = " + hotelDoc);
    }
}

match query

 @Test
    void testMatch() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
        //Single field query
        request.source().query(QueryBuilders.matchQuery("all", "Rujia"));
        //Multi-field query
// request.source().query(QueryBuilders.multiMatchQuery("The Bund", "name","brand","business"));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        handleResponse(response);
    }

Single field query

Multi-field query

Precise query and range query

 @Test
    void termQuery() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
        //Exact query
        //request.source().query(QueryBuilders.termQuery("city", "Shanghai"));
        //range query
        request.source().query(QueryBuilders.rangeQuery("price").gte(0).lte(1000));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        handleResponse(response);
    }

Boolean query

 //Boolean query
    @Test
    void boolQuery() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
        //Boolean query
        request.source().query(QueryBuilders.boolQuery()
                //Must match and participate in scoring. City=Shanghai
                .must(QueryBuilders.termQuery("city", "Shanghai"))
                //Must not match brand! = Homelike
                .mustNot(QueryBuilders.termQuery("brand", "Home Inn"))
                //Must match, but does not participate in scoring. Price>=0,<=1000
                .filter(QueryBuilders.rangeQuery("price").gte(0).lte(1000))
        );
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        handleResponse(response);
    }

Paging, sorting

 @Test
    void sortAndPage() throws IOException {
        //Page number, size of each page
        int page = 2, size = 5;

        //Note: No query conditions are written here, so all data will be queried. But it will be divided below, so only 5-10 pieces of data will be taken.
        SearchRequest request = new SearchRequest("hotel");

        //Pagination This is hard-coded
        request.source().from((page - 1) * size).size(size);

        //Sort in ascending order
        request.source().sort("price", SortOrder.ASC);

        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        handleResponse(response);
    }

handleResponse

 private void handleResponse(SearchResponse response) {
        // 4. Parse the response
        SearchHits searchHits = response.getHits();
        // 4.1. Get the total number of items
        long total = searchHits.getTotalHits().value;
        System.out.println("Total searched" + total + "data");
        // 4.2. Document array
        SearchHit[] hits = searchHits.getHits();
        // 4.3. Traverse
        for (SearchHit hit : hits) {
            // Get document source
            String json = hit.getSourceAsString();
            //Deserialize
            HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class);
            System.out.println("hotelDoc = " + hotelDoc);
        }
    }

Highlight

Highlight query

 @Test
    void highlightQuery() throws IOException {
        SearchRequest request = new SearchRequest("hotel");

        //Assemble highlighted information
        HighlightBuilder hb = new HighlightBuilder();
        hb.field("name"); //Highlight field
        hb.preTags("<em>"); // Field prefix tags
        hb.postTags("</em>"); // Field suffix tags
        hb.requireFieldMatch(false); // Conditions and highlighted fields can be inconsistent

        //Assemble query
        request.source().query(QueryBuilders.matchQuery("name", "Beijing"))
               .highlighter(hb);

        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        //Highlight analysis
        highlightHandel(response);
    }

Highlight analysis

 private void highlightHandel(SearchResponse response) {
        // 4. Parse the response
        SearchHits searchHits = response.getHits();
        //Get the total number of items
        Long total = searchHits.getTotalHits().value;
        //document array
        SearchHit[] hits = searchHits.getHits();
        //A document hit
        for (SearchHit hit : hits) {
            //A piece of original document data
            String json = hit.getSourceAsString();
            //Deserialize
            HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class);
            //Get the highlighted result String = field name HighlightField = highlighted result
            //highlightFields = {name=[name], fragments[[<em>Beijing</em>Hilton Hotel]]} Multiple pieces of this data
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            //Get highlighted results based on field name
            if (!CollectionUtils.isEmpty(highlightFields)) {
                // Get highlighted results based on field names
                HighlightField highlightField = highlightFields.get("name");
                String result = highlightField.getFragments()[0].string();
                //result is all the highlighted information. Our business requirements here cover the non-highlighted parts of the original text.
                hotelDoc.setName(result);
            }
        }
    }

Detailed explanation:

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Java Skill TreeHomepageOverview 139165 people are learning the system

syntaxbug.com © 2021 All Rights Reserved.