DSL query classification
Query all: Query all data, generally used for testing. For example: match_all
Full text search (full text) query: Use a word segmenter to segment user input content, and then match it in the inverted index database. For example:
match_query
multi_match_query
Exact query: Find data based on precise entry values, usually keyword, numerical, date, boolean and other types of fields. For example:
ids
range
term
Geographic (geo) query: Query based on longitude and latitude. For example:
geo_distance
geo_bounding_box
Compound query: Compound query can combine the above various query conditions and merge the query conditions. For example:
bool
function_score
Full text search query
Common full-text search queries include:
match query: single field query
multi_match query: multi-field query, if any field meets the conditions, it will meet the query conditions.
match query
match query: single field query
GET /hotel/_search { "query": { "match": { "all": "Home Inn" } } }
result
mulit_match
multi_match query: multi-field query, if any field meets the conditions, it will meet the query conditions.
GET /hotel/_search { "query": { "multi_match": { "query": "百家", "fields": ["brand", "name"] } } }
result
The results of the two queries are the same because we copied the brand, name, and business values into the all field using copy_to. Therefore, if you search based on three fields, the effect of searching based on the all field will be the same. However, the more search fields there are, the greater the impact on query performance, so it is recommended to use copy_to and then single-field query.
Accurate query
Because the field search for precise query is a field without word segmentation, the query conditions must also be not word segmentation terms. When querying, the content entered by the user will be considered qualified only if it completely matches the automatic value. If the user inputs too much content, no data can be searched.
Precise queries generally search for keyword, numerical, date, boolean and other types of fields. Therefore, the search terms are not divided into words. Common ones are:
term: Query based on the exact value of the term
range: query based on the range of values
term query
Query based on the exact value of the term
GET /hotel/_search { "query": { "term": { "city": { "value": "Beijing" } } } }
result
range query
Query based on the range of values. Here gte represents greater than or equal to, gt represents greater than, lte represents less than or equal to, lt represents less than
GET /hotel/_search { "query": { "range": { "price": { "gte": 10, "lte": 200 } } } }
Geographical coordinate query
The so-called geographical coordinate query is actually a query based on longitude and latitude
Rectangular range query
Rectangular range query, that is, geo_bounding_box query, queries all documents whose coordinates fall within a certain rectangular range. You need to specify the top_left: upper left and bottom_right: lower right of the rectangle. coordinates, and then draw a rectangle. All points falling within the rectangle meet the conditions.
GET /hotel/_search { "query": { "geo_bounding_box": { "location": { "top_left": { "lat": 31.1, "lon": 121.5 }, "bottom_right": { "lat": 30.9, "lon": 121.7 } } } } }
Get nearby people as a result
Nearby search
Nearby query, also called distance query (geo_distance): Query all documents whose specified center point is less than a certain distance value. In other words, find a point on the map as the center of the circle, draw a circle with the specified distance as the radius, Coordinates falling within the circle are considered eligible
GET /hotel/_search { "query": { "geo_distance": { "distance": "15km", "location": "31.21,121.5" } } }
Obtain the result that the nearby 15km is a circle
Composite query
Compound query: Compound query can combine other simple queries to implement more complex search logic. There are two common ones:
Function score: score function query, which can control document relevance calculation and document ranking.
bool query: Boolean query, which uses logical relationships to combine multiple other queries to achieve complex searches.
Score function query
When we use match query, the document results will be scored according to the relevance to the search term (_score), and the results will be returned in descending order of the score.
The unction score query contains four parts:
Original query condition: query part, search documents based on this condition, and score the document based on the BM25 algorithm, original score (query score)
Filter conditions: filter part, only documents that meet this condition will be re-calculated.
Score function: Documents that meet the filter conditions must be calculated based on this function. The resulting function score (function score) has four types of functions.
weight: The function result is a constant
field_value_factor: Use a field value in the document as the function result
random_score: Use random numbers as function results
script_score: Custom score function algorithm
Operation mode: The result of the score function, the correlation score of the original query, and the operation method between the two, including:
multiply: multiply
replace: replace query score with function score
Others, such as: sum, avg, max, min
GET /hotel/_search { "query": { "function_score": { "query": { "match": { "all": "百家" } }, "functions": [ { "filter": { "term": { "brand": "Home Inn" } }, "weight": 2 } ], "boost_mode": "sum" } } }
result
Boolean query
A Boolean query is a combination of one or more query clauses, and each clause is a subquery. Subqueries can be combined in the following ways:
must: Must match each subquery, similar to “and”
should: Selective matching subquery, similar to “or”
must_not: must not match, does not participate in scoring, similar to “not”
filter: must match, does not participate in scoring
GET /hotel/_search { "query": { "bool": { "must": [ {"term": {"city": "Shanghai" }} ], "should": [ {"term": {"brand": "Crown Plaza" }}, {"term": {"brand": "RAMADA" }} ], "must_not": [ { "range": { "price": { "lte": 500 } }} ], "filter": [ { "range": {"score": { "gte": 45 } }} ] } } }
result
It should be noted that when searching, the more fields involved in scoring, the worse the query performance will be. Therefore, when doing this kind of multi-condition query, it is recommended to do this:
-
The keyword search in the search box is a full-text search query. Use must query to participate in score calculation.
-
For other filtering conditions, use filter query. Not involved in scoring
Search result processing
Search results can be processed or displayed in the way specified by the user
Sort
Elasticsearch defaults to sorting based on relevance score (_score), but it also supports custom ways to sort search results. Field types that can be sorted include: keyword type, numerical type, geographical coordinate type, date type, etc.
Ordinary field sorting
The syntax for sorting keyword, numeric, and date types is basically the same.
GET /hotel/_search { "query": { "match_all": {} }, "sort": [ { "score": "desc" }, { "price":"asc" } ] }
result
Geographical coordinate sorting
GET /hotel/_search { "query": { "match_all": {} }, "sort": [ { "_geo_distance" : { "location" : { "lat":31.030001, "lon":121.610000 }, "order" : "asc", "unit" : "km" } } ] }
Specify a coordinate as the target point
Calculate the distance from the coordinates of the specified field (must be of geo_point type) to the target point in each document
Sort by distance
Pagination
By default, elasticsearch only returns top10 data. If you want to query more data, you need to modify the paging parameters. In elasticsearch, the paging results to be returned are controlled by modifying the from and size parameters.
-
from: From which document to start
-
size: How many documents are queried in total?
Basic paging
GET /hotel/_search { "query": { "match_all": {} }, "from": 0, "size": 10, "sort": [ {"price": "asc"} ] }
Result
Deep paging problem
Now, I want to query the data from 990 to 1000. The query logic should be written like this
GET /hotel/_search { "query": { "match_all": {} }, "from": 990, "size": 10, "sort": [ {"price": "asc"} ] }
When the query paging depth is large, there will be too much summary data, which will put a lot of pressure on the memory and CPU. Therefore, elasticsearch will prohibit requests from + size exceeding 10,000.
For deep paging, ES provides two solutions:
-
search after: Sorting is required during paging. The principle is to query the next page of data starting from the last sort value. Officially recommended method.
-
scroll: The principle is to form a snapshot of the sorted document ID and save it in the memory. Officially no longer recommended.
Highlight
GET /hotel/_search { "query": { "match": { "all": "百家" } }, "highlight": { "fields": { "name": { "pre_tags": "<em>", "post_tags": "</em>", "require_field_match": "false" } } } }
result
Note:
Highlighting is keyword highlighting, so the search condition must contain keywords and cannot be a range query.
By default, the highlighted field must be consistent with the field specified in the search, otherwise it cannot be highlighted.
If you want to highlight non-search fields, you need to add an attribute: required_field_match=false