Executing a search query with 30K+ clauses

I am interested in executing an Elasticsearch search query with 5000+ terms across 5 different fields which leads to 25K+ clauses. According to documentation default max_clause_count limit is set to 1024 and updating this limit to 30K doesn't work as expected.

Can anyone please suggest an efficient alternative to execute this query?

Thanks.

Sample query:

"query":{
        "bool":{
            "should":[
                {
                    "match_phrase_prefix":{
                        "field_1": "term_1"
                    }
                }
                .
                .
                .
                {
                    "match_phrase_prefix":{
                        "field_1": "term_5000"
                    }
                },
                {
                    "match_phrase_prefix":{
                        "field_2": "term_1"
                    }
                }
                .
                .
                .
                {
                    "match_phrase_prefix":{
                        "field_2": "term_5000"
                    }
                }
                .
                .
                .
            ]
        }
    }

Limits in Elasticsearch are generally there for a very good reason and queries with huge amounts of clauses tend to be slow, especially for large data sets. I therefore doubt there is any efficient way to make your query run as it is constructed.

You have not described the data and how the fields queried are mapped nor what the query actually does. Without knowing this it is hard for anyone to provide alternative approaches of achieving what you are looking for.

It would also be useful to know how large your data set is and how it is sharded together with what your query latency target/limit is.

I benchmarked an alternative approach for large numbers of terms that might be of interest.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.