Howdy,
I am new to the ES community so feel free to criticize my rookie mistakes. Currently I am playing around the term suggester and stumbled across the following behavior that seems counter intuitive:
Here is the code to generate a minimal working example:
POST max-term-freq-test/_bulk?refresh=true
{ "index": { "_id": "0" } }
{ "tags": ["hundred"] }
{ "index": { "_id": "1" } }
{ "tags": ["hundred", "eighty"] }
{ "index": { "_id": "2" } }
{ "tags": ["hundred", "eighty", "sixty"] }
{ "index": { "_id": "3" } }
{ "tags": ["hundred", "eighty", "sixty", "forty"] }
{ "index": { "_id": "4" } }
{ "tags": ["hundred", "eighty", "sixty", "forty", "twenty"] }
POST /max-term-freq-test/_search
{
"suggest" : {
"my-suggestion" : {
"text" : "fourti",
"term" : {
"field" : "tags",
"size" : 8,
"suggest_mode" : "always",
"min_word_length" : 3,
"max_term_freq" : 0.5
}
}
}
}
What happens:
The suggester proposes the word "fourty" independently of the value for max_term_freq.
What I expect:
If the value for max_term_freq is set too high (e.g. 0.5) the suggester should not be able to find the words that appear too frequently (in this case in >=50% of all documents).
Thank you in advance
PS: I am using ES 8.8.2