Get partial results in case of too_many_clauses

Hello!
We use Elasticsearch to index small documents with intensive use of ngram edge tokenizer and synonyms at index time. On certain queries it leads to humongous amount of clauses and we get the following error:

 {
  "error": {
    "root_cause": [
      {
        "type": "too_many_clauses",
        "reason": "too_many_clauses: maxClauseCount is set to 100000"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my_index_name",
        "node": "KE0mOUoSQIaDtXE4KpeHeQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: {\n  \"multi_match\" : {\n    \"query\" : \"weird query generating lots of clauses\",\n    \"fields\" : [\n      \"attr_1^4.0\",\n      \"attr_2^6.0\",\n      \"exact_attr_1^20.0\",\n      \"exact_attr_2^30.0\",\n      \"exact_name^20.0\",\n      \"name^3.0\"\n    ],\n    \"type\" : \"most_fields\",\n    \"operator\" : \"OR\",\n    \"slop\" : 0,\n    \"prefix_length\" : 0,\n    \"max_expansions\" : 1,\n    \"zero_terms_query\" : \"NONE\",\n    \"auto_generate_synonyms_phrase_query\" : true,\n    \"fuzzy_transpositions\" : true,\n    \"boost\" : 1.0\n  }\n}",
          "index_uuid": "WE3OBFBISoqVGzbkiP5qxQ",
          "index": "my_index_name",
          "caused_by": {
            "type": "too_many_clauses",
            "reason": "too_many_clauses: maxClauseCount is set to 100000"
          }
        }
      }
    ],
    "caused_by": {
      "type": "too_many_clauses",
      "reason": "too_many_clauses: maxClauseCount is set to 100000"
    }
  },
  "status": 400
}

Our server-side setting for maxClauseCount is set to 100000, which is already big. Query-time parameter max_expansions does not constrain clause count at all. Anyway, we want to get results even for a subset of clauses. Is this possible?

No, this is not possible. The clause check is done before executing the actual query to protect the server from being loaded with such a huge query. Running the query on only a subset of clauses usually defeats the purpose of full test search (or any kind of retrieval) since you don't know and cannot really controll which actual clauses make it into the query.

I think the best course of action would be to determine what kind of client-side search causes the clause count to exceed 100.000 and try to rephrase that query so it doesn't blow up that easily.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.