Term suggester returning correct suggestions for misspelled words outside of suggester data source

Please help me understand why this happens. We use Elasticsearch 7.10 term suggester. The field didYouMean.trigram we use for the suggestions has the following mapping

"didYouMean" : {
  "type" : "text",
  "fields" : {
    "trigram" : {
      "type" : "text",
      "analyzer" : "trigram"
    }    
  }
}

trigram analyzer definition

"trigram" : {
  "filter" : [ "lowercase", "asciifolding", "shingle" ],
  "type" : "custom",
  "tokenizer" : "standard"
},

shingle token filter definition

"shingle" : {
  "max_shingle_size" : "3",
  "min_shingle_size" : "2",
  "type" : "shingle"
},

The actual term suggest query looks like

"suggest": {
  "didYouMeanSuggestTerm":{
    "term": {
      "min_word_length":3,
      "suggest_mode":"popular",
      "field":"didYouMean.trigram"
    },
    "text":"<my query>"
  }
}

The confusing part is that didYouMeanSuggestTerm is giving me good corrections of words such as earing -> earring or jewelery into jewellery, even though earring nor jewellery is part of the data indexed under didYouMean.trigram.

It looks to me (and ChatGPT) as if the term suggester is using fuzziness and something more to suggest words out of the data source available for the suggester, however I did not found any information about this behavior in the documentation.

Can someone help me understand how term suggester is able to pull up these corrections?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.