Term suggester returning correct suggestions for misspelled words outside of suggester data source

MilanGatyas · May 10, 2023, 12:26pm

Please help me understand why this happens. We use Elasticsearch 7.10 term suggester. The field didYouMean.trigram we use for the suggestions has the following mapping

"didYouMean" : {
  "type" : "text",
  "fields" : {
    "trigram" : {
      "type" : "text",
      "analyzer" : "trigram"
    }    
  }
}

trigram analyzer definition

"trigram" : {
  "filter" : [ "lowercase", "asciifolding", "shingle" ],
  "type" : "custom",
  "tokenizer" : "standard"
},

shingle token filter definition

"shingle" : {
  "max_shingle_size" : "3",
  "min_shingle_size" : "2",
  "type" : "shingle"
},

The actual term suggest query looks like

"suggest": {
  "didYouMeanSuggestTerm":{
    "term": {
      "min_word_length":3,
      "suggest_mode":"popular",
      "field":"didYouMean.trigram"
    },
    "text":"<my query>"
  }
}

The confusing part is that didYouMeanSuggestTerm is giving me good corrections of words such as earing -> earring or jewelery into jewellery, even though earring nor jewellery is part of the data indexed under didYouMean.trigram.

It looks to me (and ChatGPT) as if the term suggester is using fuzziness and something more to suggest words out of the data source available for the suggester, however I did not found any information about this behavior in the documentation.

Can someone help me understand how term suggester is able to pull up these corrections?

system · June 7, 2023, 12:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Spellchecking with term and phrase suggesters Elasticsearch	4	386	July 6, 2017
Phrase suggester giving suggestion on correct terms containing number values Elasticsearch	1	164	August 25, 2023
"Did you mean" feature using elasticsearch Elasticsearch	3	1030	November 12, 2018
Problem with phrase suggestor Elasticsearch	3	576	December 29, 2017
Trouble with Phrase Suggester Elasticsearch	3	615	September 7, 2017

Term suggester returning correct suggestions for misspelled words outside of suggester data source

Related topics