We're using the Edge NGram tokenizer (ElasticSearch 7.3 Windows) on letters and digits from 3 to 10 characters. However when we upload a text test 11kw
and search for text: test AND text: 11k
we find nothing.
Interestingly even the full word finds nothing: text: test AND text: 11kw
Document {"text":"test kw11"}
and searching text: test AND text: kw11
also brings no results.
However, with {"text":"test kwak"}
and searching text: test AND text: kwa
DOES BRING RESULTS!
PUT /myindex
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": ["lowercase"]
},
"autocomplete_search": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 10,
"token_chars": ["letter", "digit"]
}
}
}
},
"mappings": {
"dynamic": false,
"properties": {
"text": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
Upload a document:
PUT /myindex/_bulk
{"index":{"_id":"test11"}}
{"text":"test 11kw"}`
Search:
GET /myindex/_search?q=text: test AND text: 11kw
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
When we analyze the string "11kw" we see it is tokenized as expected:
GET /_analyze
{
"analyzer": "autocomplete",
"text": "test 11KW"
}
{
"tokens": [
{
"token": "tes",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "test",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 1
},
{
"token": "11k",
"start_offset": 5,
"end_offset": 8,
"type": "word",
"position": 2
},
{
"token": "11kw",
"start_offset": 5,
"end_offset": 9,
"type": "word",
"position": 3
}
]
}