Please help me understand why this happens. We use Elasticsearch 7.10 term suggester. The field didYouMean.trigram
we use for the suggestions has the following mapping
"didYouMean" : {
"type" : "text",
"fields" : {
"trigram" : {
"type" : "text",
"analyzer" : "trigram"
}
}
}
trigram
analyzer definition
"trigram" : {
"filter" : [ "lowercase", "asciifolding", "shingle" ],
"type" : "custom",
"tokenizer" : "standard"
},
shingle
token filter definition
"shingle" : {
"max_shingle_size" : "3",
"min_shingle_size" : "2",
"type" : "shingle"
},
The actual term suggest query looks like
"suggest": {
"didYouMeanSuggestTerm":{
"term": {
"min_word_length":3,
"suggest_mode":"popular",
"field":"didYouMean.trigram"
},
"text":"<my query>"
}
}
The confusing part is that didYouMeanSuggestTerm
is giving me good corrections of words such as earing
-> earring
or jewelery
into jewellery
, even though earring
nor jewellery
is part of the data indexed under didYouMean.trigram
.
It looks to me (and ChatGPT) as if the term suggester is using fuzziness and something more to suggest words out of the data source available for the suggester, however I did not found any information about this behavior in the documentation.
Can someone help me understand how term suggester is able to pull up these corrections?