Question regarding the parameter max_term_freq in term suggester

qwertzu · November 16, 2023, 1:04pm

Howdy,

I am new to the ES community so feel free to criticize my rookie mistakes. Currently I am playing around the term suggester and stumbled across the following behavior that seems counter intuitive:

Here is the code to generate a minimal working example:

POST max-term-freq-test/_bulk?refresh=true
{ "index": { "_id": "0" } }
{ "tags": ["hundred"] }
{ "index": { "_id": "1" } }
{ "tags": ["hundred", "eighty"] }
{ "index": { "_id": "2" } }
{ "tags": ["hundred", "eighty", "sixty"] }
{ "index": { "_id": "3" } }
{ "tags": ["hundred", "eighty", "sixty", "forty"] }
{ "index": { "_id": "4" } }
{ "tags": ["hundred", "eighty", "sixty", "forty", "twenty"] }

POST /max-term-freq-test/_search
{
  "suggest" : {
          "my-suggestion" : {
            "text" : "fourti",
            "term" : {
              "field" : "tags",
              "size" : 8,
              "suggest_mode" : "always",
              "min_word_length" : 3,
              "max_term_freq" : 0.5
            }
          }
  }
}

What happens:
The suggester proposes the word "fourty" independently of the value for max_term_freq.

What I expect:
If the value for max_term_freq is set too high (e.g. 0.5) the suggester should not be able to find the words that appear too frequently (in this case in >=50% of all documents).

Thank you in advance

PS: I am using ES 8.8.2

system · December 14, 2023, 1:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.