We are in the process of upgrading from Elasticsearch 6.5.4 to 7.10.1. We have some searches that worked fine against 6.5.4 but are getting "failed to create query: maxClauseCount is set to 1024" errors against 7.10.1. Here is an example of the query:
"query": {
"bool": {
"must": [
{
"query_string": {
"default_operator": "and",
"fields": [
"field1^4",
"field2^3",
"field3^2.5",
"field4^2",
"field5^1.5",
"field6",
"field7",
"field8",
"field9",
"field10^1.7",
"field11",
"field12"
],
"query": "long query with synonyms",
"tie_breaker": 0.3
}
}
],
"should": [
{
"multi_match": {
"fields": [
"field1^4",
"field2^3",
"field3^2.5",
"field4^2",
"field5^1.5",
"field6",
"field7",
"field8",
"field9"
],
"query": "long query with synonyms",
"slop": 2,
"type": "phrase",
"boost": 5.0
}
}
]
}
}
The factors that seem to determine whether it generates too many clauses seem to be the number of terms, whether those terms have synonym entries, being a phrase query and the slop. The term that I know fails does not fail with a slop of 0 but will fail with any positive value for slop. I have multiple questions. Obviously my main objective is to fix my issue but I also want to understand better what is going on. Clearly, something happened internally in the way that queries are expanded into Lucene boolean clauses and there are more being generated now than before. Looking at the profile on the query, it looks like quite a few clauses are being created for synonyms. The multi_match with type phrase and positive slop is definitely the problem area. If I remove the query_string search, it makes no impact on whether I get the error or not. And if I change the multi_match to a query_string query, make the query in quotes making it a phrase, and give a positive phrase_slop, I get the same error. The reason for both the queries is because I want all matches that contain all terms but I want those that match the phrase to be ranked higher. Is there a better way to do that? I definitely get different rankings without both searches being there. I'm kind of surprised that phrase query is apparently creating more clauses. I would have thought it would create less especially with "auto_generate_synonyms_phrase_query" set to false but I guess that's where the slop comes in.
I would appreciate any guidance on what might be happening and how I can fix it. If I do need to increase the max_clause_count setting, I guess there's no way to do it by index or even by query so that I can ensure no other queries on other indexes can accidentally exceed that count?