Hello!
We use Elasticsearch to index small documents with intensive use of ngram edge tokenizer and synonyms at index time. On certain queries it leads to humongous amount of clauses and we get the following error:
{
"error": {
"root_cause": [
{
"type": "too_many_clauses",
"reason": "too_many_clauses: maxClauseCount is set to 100000"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "my_index_name",
"node": "KE0mOUoSQIaDtXE4KpeHeQ",
"reason": {
"type": "query_shard_exception",
"reason": "failed to create query: {\n \"multi_match\" : {\n \"query\" : \"weird query generating lots of clauses\",\n \"fields\" : [\n \"attr_1^4.0\",\n \"attr_2^6.0\",\n \"exact_attr_1^20.0\",\n \"exact_attr_2^30.0\",\n \"exact_name^20.0\",\n \"name^3.0\"\n ],\n \"type\" : \"most_fields\",\n \"operator\" : \"OR\",\n \"slop\" : 0,\n \"prefix_length\" : 0,\n \"max_expansions\" : 1,\n \"zero_terms_query\" : \"NONE\",\n \"auto_generate_synonyms_phrase_query\" : true,\n \"fuzzy_transpositions\" : true,\n \"boost\" : 1.0\n }\n}",
"index_uuid": "WE3OBFBISoqVGzbkiP5qxQ",
"index": "my_index_name",
"caused_by": {
"type": "too_many_clauses",
"reason": "too_many_clauses: maxClauseCount is set to 100000"
}
}
}
],
"caused_by": {
"type": "too_many_clauses",
"reason": "too_many_clauses: maxClauseCount is set to 100000"
}
},
"status": 400
}
Our server-side setting for maxClauseCount is set to 100000, which is already big. Query-time parameter max_expansions does not constrain clause count at all. Anyway, we want to get results even for a subset of clauses. Is this possible?