Fuzzy query
Prefix search: prefix
Concept: Searches starting with xx do not calculate the relevance score.
Note:
-
- Prefix search matches term, not field.
-
- Prefix search performance is poor
-
- Prefix search is not cached
-
- Prefix search sets the prefix length as long as possible
Syntax:
GET <index>/_search { "query": { "prefix": { "<field>": { "value": "<word_prefix>" } } } } index_prefixes: default "min_chars" : 2, "max_chars" : 5
Wildcard: wildcard
Concept: Wildcard operators are placeholders that match one or more characters. For example, the * wildcard operator matches zero or more characters. You can use wildcard operators with other characters to create wildcard patterns.
Note:
-
- The wildcard also matches term, not field
Syntax:
GET <index>/_search { "query": { "wildcard": { "<field>": { "value": "<word_with_wildcard>" } } } }
Regular: regexp
Concept: The performance of regexp queries can vary depending on the regular expression provided. To improve performance, avoid using wildcard patterns such as . or .? + without prefix or suffix
Syntax:
GET <index>/_search { "query": { "regexp": { "<field>": { "value": "<regex>", "flags": "ALL", } } } }
flags
-
- ALL
Enable all optional operators.
- ALL
-
- COMPLEMENT
Enable operator. You can use negation of the shortest pattern below. For example
a~bc # matches ‘adc’ and ‘aec’ but not ‘abc’
- COMPLEMENT
-
- INTERVAL
Enable the <> operator. You can use <> to match numerical ranges. For example
foo<1-100> # matches ‘foo1’, ‘foo2’ … ‘foo99’, ‘foo100’
foo<01-100> # matches ‘foo01’, ‘foo02’ … ‘foo99’, ‘foo100’
- INTERVAL
-
- INTERSECTION
Enables the & operator, which acts as the AND operator. If both the left and right patterns match, the match is successful. For example:
aaa. + & amp;. + bbb # matches ‘aaabbb’
- INTERSECTION
-
- ANYSTRING
Enable @ operator. You can use @ to match any entire string.
You can combine the @ operator with the & and ~ operators to create “everything except” logic. For example:
@ & amp;~(abc. + ) # matches everything except terms beginning with ‘abc’
- ANYSTRING
- Confusing characters (box → fox) Missing characters (black → lack)
Extra characters (sic → sick) Reverse order (act → cat)
Fuzzy query: fuzzy
Syntax
GET <index>/_search { "query": { "fuzzy": { "<field>": { "value": "<keyword>" } } } }
Parameters:
-
- value: (required, keyword)
-
- fuzziness: Edit distance, (0, 1, 2) is not bigger, the better, the recall rate is high but the results are inaccurate
-
-
- The Damerau-Levenshtein distance between two pieces of text is the number of insertions, deletions, substitutions and transpositions required to make one string match another.
- Distance formula: Levenshtein is lucene, es improved version: Damerau-Levenshtein,
-
ax=>aex Levenshtein=2 Damerau-Levenshtein=1
-
- transpositions: (optional, boolean) Indicates whether the edit includes transpositions of two adjacent characters (ab→ba). Default is true.
Phrase prefix: match_phrase_prefix
match_phrase:
-
- match_phrase will segment words
-
- The retrieved field must contain all terms in match_phrase and the order must be the same
-
- There cannot be other terms between the terms in the match_phrase contained in the retrieved field.
Concept:
match_phrase_prefix is the same as match_phrase, but it has one more feature, that is, it allows prefix matching on the last term of the text. If it is a word, such as a, it will match all documents starting with a in the document field. If it is a phrase, such as “this is ma”, he will first perform a search prefixed with ma in the inverted index, and then perform a match_phrase query in the matched doc. (Some people on the Internet say that they match_phrase first, and then perform Prefix search is wrong)
Parameter
-
- analyzer specifies what analyzer to perform word segmentation on this phrase
-
- max_expansions limits the maximum number of matching terms
-
- boost is used to set the weight of the query
-
- slop allows term separation between phrases: the slop parameter tells match_phrase how far apart the query terms are before still considering the document as a match. What is how far apart? Meaning how many times do you need to move the terms in order for the query to match the document?
Principle analysis: How to Use Fuzzy Searches in Elasticsearch | Elastic Blog
N-gram and edge ngram
tokenizer
GET _analyze { "tokenizer": "ngram", "text": "reba always loves me" }
token filter
min_gram: the minimum threshold for splitting characters to create an index
max_gram: The maximum threshold for splitting characters when creating an index
ngram: Starting from each character, perform word segmentation according to the step size, suitable for prefix and infix retrieval
edge_ngram: Starting from the first character, perform word segmentation according to the step size, suitable for prefix matching scenarios
#prefix: prefix search DELETE my_index # elasticsearch stack #elasticsearch search #el #ela #elas elasticsearch PUT my_index { "mappings": { "properties": { "text": { "analyzer": "ik_max_word", "type": "text", "index_prefixes":{ "min_chars":2, "max_chars":4 }, "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } GET my_index/_mapping POST /my_index/_bulk?filter_path=items.*.error {"index":{"_id":"1"}} {"text":"The urban management called the vendors to set up stalls"} {"index":{"_id":"2"}} {"text":"Xiaoguo Culture responds to traders and old farmers setting up stalls"} {"index":{"_id":"3"}} {"text":"It took the old farmer 17 years to grow the chair tree"} {"index":{"_id":"4"}} {"text":"The couple had been married for more than 30 years under the AA system and were arrested by the urban management"} {"index":{"_id":"5"}} {"text":"A black man bravely tried to stop a robbery but was handcuffed"} GET my_index/_search GET my_index/_mapping GET_analyze { "text": ["The couple has been married for more than 30 years under the AA system and was arrested by the urban management"] } GET my_index/_search { "query": { "prefix": { "text": { "value": "urban management" } } } } ################################################ ############# # wildcard DELETE my_index POST /my_index/_bulk { "index": { "_id": "1"} } { "text": "my english" } { "index": { "_id": "2"} } { "text": "my english is good" } { "index": { "_id": "3"} } { "text": "my chinese is good" } { "index": { "_id": "4"} } { "text": "my japanese is nice" } { "index": { "_id": "5"} } { "text": "my disk is full" } DELETE product_en POST /product_en/_bulk { "index": { "_id": "1"} } { "title": "my english","desc" : "shouji zhong de zhandouji","price" : 3999, "tags": [ "xingjiabi", "fashao", "buka", "1"]} { "index": { "_id": "2"} } { "title": "xiaomi nfc phone","desc" : "zhichi quangongneng nfc,shouji zhong de jianjiji","price" : 4999, "tags": [ "xingjiabi", "fashao", "gongjiaoka" , " asd2fgas"]} { "index": { "_id": "3"} } { "title": "nfc phone","desc" : "shouji zhong de hongzhaji","price" : 2999, "tags": [ "xingjiabi", "fashao", "menjinka" , "as345"]} { "title": { "_id": "4"} } { "text": "xiaomi erji","desc" : "erji zhong de huangmenji","price" : 999, "tags": [ "low", "bufangshui", "yinzhicha", "4dsg" ]} { "index": { "_id": "5"} } { "title": "hongmi erji","desc" : "erji zhong de kendeji","price" : 399, "tags": [ "lowbee", "xuhangduan", "zhiliangx" , "sdg5"]} GET my_index/_search GET product_en/_search GET my_index/_search { "query": { "wildcard": { "text.keyword": { "value": "my eng*ish" } } } } GET product_en/_mapping #exact value GET product_en/_search { "query": { "wildcard": { "tags.keyword": { "value": "men*inka" } } } } ################################################ ##### #regular GET product_en/_search GET product_en/_search { "query": { "regexp": { "title": "[\s\S]*nfc[\s\S]*" } } } GET product_en/_search GET product_en/_search { "query": { "regexp": { "desc": { "value": "zh~dng", "flags": "COMPLEMENT" } } } } GET product_en/_search { "query": { "regexp": { "tags.keyword": { "value": ".*<2-3>.*", "flags": "INTERVAL" } } } } ############################################ # fuzzy: fuzzy query GET product_en/_search GET product_en/_search { "query": { "fuzzy": { "desc": { "value": "quanggonneng nfc", "fuzziness": "2" } } } } GET product_en/_search { "query": { "match": { "desc": { "query": "nfe quasdasdasdasd", "fuzziness": 1 } } } } ##################################### # match_phrase_prefix GET product_en/_search { "query": { "match_phrase": { "desc": "shouji zhong de" } } } GET product_en/_search { "query": { "match_phrase_prefix": { "desc": { "query": "de zhong shouji hongzhaji", "max_expansions": 50, "slop":3 } } } } GET product_en/_search { "query": { "match_phrase_prefix": { "desc": { "query": "zhong hongzhaji", "max_expansions": 50, "slop": 3 } } } } # source: zhong de hongzhaji # query: zhong > hongzhaji # source: shouji zhong de hongzhaji # query: de zhong shouji hongzhaji # de shouji/zhong hongzhaji 1 time # shouji/de zhong hongzhaji 2 times # shouji zhong/de hongzhaji 3 times #shoujizhongdehongzhaji 4 times ############################################ # ngram and edge-ngram #ngram min_gram =1 "max_gram": 2 GET_analyze { "tokenizer": "ik_max_word", "filter": [ "edge_ngram" ], "text": "reba always loves me" } #min_gram =1 "max_gram": 1 #r a l m #min_gram =1 "max_gram": 2 #r a l m #re al lo me #min_gram =2 "max_gram": 3 #re al lo me #reb alw lov me PUT my_index { "settings": { "analysis": { "filter": { "2_3_edge_ngram": { "type": "edge_ngram", "min_gram": 2, "max_gram": 3 } }, "analyzer": { "my_edge_ngram": { "type":"custom", "tokenizer": "standard", "filter": [ "2_3_edge_ngram" ] } } } }, "mappings": { "properties": { "text": { "type": "text", "analyzer":"my_edge_ngram", "search_analyzer": "standard" } } } } GET /my_index/_mapping POST /my_index/_bulk { "index": { "_id": "1"} } { "text": "my english" } { "index": { "_id": "2"} } { "text": "my english is good" } { "index": { "_id": "3"} } { "text": "my chinese is good" } { "index": { "_id": "4"} } { "text": "my japanese is nice" } { "index": { "_id": "5"} } { "text": "my disk is full" } GET /my_index/_search GET /my_index/_mapping GET /my_index/_search { "query": { "match_phrase": { "text": "my eng is goo" } } } PUT my_index2 { "settings": { "analysis": { "filter": { "2_3_grams": { "type": "edge_ngram", "min_gram": 2, "max_gram": 3 } }, "analyzer": { "my_edge_ngram": { "type":"custom", "tokenizer": "standard", "filter": [ "2_3_grams" ] } } } }, "mappings": { "properties": { "text": { "type": "text", "analyzer":"my_edge_ngram", "search_analyzer": "standard" } } } } GET /my_index2/_mapping POST /my_index2/_bulk { "index": { "_id": "1"} } { "text": "my english" } { "index": { "_id": "2"} } { "text": "my english is good" } { "index": { "_id": "3"} } { "text": "my chinese is good" } { "index": { "_id": "4"} } { "text": "my japanese is nice" } { "index": { "_id": "5"} } { "text": "my disk is full" } GET /my_index2/_search { "query": { "match_phrase": { "text": "my eng is goo" } } } GET_analyze { "tokenizer": "ik_max_word", "filter": [ "ngram" ], "text": "Make skin with your heart, play games with your feet" }