I'm trying to make sense of the behaviour of this query:
GET /categories/category/_search
{
"query": {
"bool": {
"filter": {
"term": {
"classification_system_id": 1
}
},
"must": {
"match_phrase_prefix": {
"category_id": {
"query": "13",
"max_expansions" : 20
}
}
}
}
}
}
The term filter matches about 200 out of 26000 documents in the index.
The results of the match_phrase_prefix are highly unpredictable, it returns fewer results than it should for some prefixes but not others. For example if I search for prefix "13" it should find 10 but returns only 1, but for prefix "19" it returns all 11 results as it should. Other values are equally unpredictable. If the filter is removed the problem goes away, as far as I can tell.
Things improve only by bumping the max_expansions way up. To get the expected results for 2 character prefixes requires max_expansions around 200, and for 1 character prefixes around 1000!
Is this normal? If so, what is the proper solution? Setting max_expansions to such high numbers appears to be a bad idea, if the documentation is to be believed.