Massive performance degradation when terms filter has over 16 values?

Came across some odd behavior. We have a query that performs in the tens of milliseconds until we go over 16 values in our terms filter. When 17 or more are included the performance degrades by 15-20 multiples.

Here is what the query looks like with values mocked for security:

	"query": {
		"bool": {
			"filter": [
					"terms": {
						"problematicField.raw": [

The underlying field has a mapping that looks like this (there are other fields but for simplicity):

    "properties" : {
      "problematicField" : {
        "type" : "text",
        "fields" : {
          "raw" : {
            "type" : "keyword"
        "analyzer" : "lowercase"

Here is what I found so far:

  • the values of the field don't seem to matter, I can rearrange the first 16 and swap out the value of 17+ but as long as it is 16 or less the performance is tens of milliseconds
  • as soon as 17+ fields are added to the terms filter the latency averages around 1500 ms
  • 'problematicField' is a low cardinality field with a typically length of 3-15 characters
  • we are querying 7 indices at a time with the same schema/mapping, there are two particular indices that I can remove and get it to perform in the tens of milliseconds again
    • these indices are the two largest of the 7 with in the 2-4 TB range
    • these indices perform very well when the filter is 16 or less fields
    • these indices perform very well for other queries including much more complex queries
  • we are on Elasticsearch 8.6.2

Is there some data type or algorithmic threshold we cross when we get over 16 values, particularly on large indices? The value 16 being as important as it is in computers makes me wonder if this is a coincidence.

Has anyone seen anything like this?

Appreciate any help or insight you may have!

Any ideas? Anyone come across this? Thank you!

This seems odd, I don't have an exact recommendation here, just some general recommendations:

  1. What does the Search Profile API show when doing this test? Any phase that sticks out as increasing greatly?
  2. What happens when you try using multiple term queries in a filter should query rather than a single terms filter?
    • This is what products like Kibana do when doing a filter like is one of.
  3. Can you test out the issue on a newer version of Elasticsearch? It looks like 8.6.2 was released almost a year ago (mid-Feb 2023). There have been 5 minor releases since then with a number of enhancements.

@BenB196 really appreciate your response.

#1: I had used the profile option and should have mentioned that when I leave the query exactly as is but add profile: true suddenly the query is back to tens of milliseconds again consistently. This seems very odd to me.

#2: Great idea. Do you happen to have a link for the Kibana use case you mentioned, would be interested to see that. When I rearrange the query to look like the below the performance is back to normal if not a little better which is great, will look into changing the code to do this. Any idea if there is a best practices that calls this out? Make me wonder what else I have been missing.

Also any idea why this is more efficient? My assumption is based on what I saw with the profiling on where this new query has one BoostQuery per shard with a handful of children but the previous query had a BoostQuery and many permutations of BooleanQuery on the field in question, a bit surprised that isn't more optimized under the hood.

	"query": {
		"bool": {
			"filter": {
				"bool": {
					"should" : [
						{ "term" : { "problematicField.raw" :  "value-1" } },
						{ "term" : { "problematicField.raw" :  "value-2" } },
						{ "term" : { "problematicField.raw" :  "value-3" } },
						{ "term" : { "problematicField.raw" :  "value-4" } },
						{ "term" : { "problematicField.raw" :  "value-5" } },
						{ "term" : { "problematicField.raw" :  "value-6" } },
						{ "term" : { "problematicField.raw" :  "value-7" } },
						{ "term" : { "problematicField.raw" :  "value-8" } },
						{ "term" : { "problematicField.raw" :  "value-9" } },
						{ "term" : { "problematicField.raw" :  "value-10" } },
						{ "term" : { "problematicField.raw" :  "value-11" } },
						{ "term" : { "problematicField.raw" :  "value-12" } },
						{ "term" : { "problematicField.raw" :  "value-13" } },
						{ "term" : { "problematicField.raw" :  "value-14" } },
						{ "term" : { "problematicField.raw" :  "value-15" } },
						{ "term" : { "problematicField.raw" :  "value-16" } },
						{ "term" : { "problematicField.raw" :  "value-17" } },
					"minimum_should_match" : 1

I hope the above is what you meant?

#3: there is a number of factors limiting my ability to upgrade versions right now unfortunately, I skimmed through the release notes and didn't see anything obviously related but I may have been looking for the wrong things

Really appreciate your response again. Would love to know the why it seems to be the 16 term threshold on large indices that slows things down to help me find other potential thresholds, our use case is GraphQL based which makes query possibilities very dynamic.

The profile being faster suggests that a query optimisation used for non-profiled queries is misbehaving.
If you try set “track_total_hits” to true or add an aggregation you may see a similar speed-up.

The query optimisation that could be misbehaving is the one that tries to ignore some query terms halfway through running a search if it thinks they are unlikely to produce a competitive match that would make the final top N results. If you profile, set track_total_hits to true or have aggregations that means you want to consider all matches and this query optimisation is not attempted.

Conceptually that makes sense, definitely seems like some threshold is hitting an edge case, again in my small sample size it was only apparent in indices over a couple TB despite the mapping/schema being the same.

I tried adding the ?track_total_hits=true and I don't see any improvement, quite a bit slower actually, averaging roughly 2.5 seconds with "profile": false. With "profile": true and ?track_total_hits=true it is around 50-100 ms.

Note this is going off off the took field in the response as round trip time is impacted by the fact that profile response is several MB.

There is a threshold here where queries with many terms stop iterating over the many streams of matching doc ids (one for each term) and switch to a different strategy. This alternative approach populates a single bitset which merges the doc ids of multiple terms, reducing the number of next() calls it does on each stream. This approach pays dividends in most cases but clearly not here.

@Mark_Harwood1 that is really interesting. Is there any reference that we can look at to find where these thresholds exist? Would be something that we could implement a warning or flag for our GraphQL use case

I’m not aware of any elasticsearch documentation - that link was to Lucene code.

For very large lists of terms there is yet another choice of strategy but is one which the engine can’t easily foresee will pay off so hasn’t been implemented.
If the engine can’t reliably foresee the right approach and the user has some idea on what works best then an “execution hint” might be a useful addition to the api - like the one used in aggregations

Yes I can see this being quite useful

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.