High cardinality field term agg results in slow query regardless of hit count

mahdouch · August 15, 2016, 8:19pm

Given an index with a field that has high cardinality (1M+ distinct values), the following query has 0 hits and takes a ton of time to return even though the number of hits is 0.

If I add "execution_hint":"map" to the aggregation then it returns pretty quickly.

Can someone explain the behavior ? Is ES doing some expensive work before actually running the query ?

Tried this with both 1.7 and 2.3 and see the same behavior. Btw, the profiling in 2.3 (which is pretty neat) doesn't explain where most of the time is being spent.

{
  "size": 0,
  "query": {
    "query_string": {
      "query": "nothing:should_match_this"
    }
  },
  "aggregations": {    
    "members": {
      "terms": {
        "field": "high_card_field",
        "size": 10
      }
    }
  }
}

===

{
  "took": 2417,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "members": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

polyfractal · August 15, 2016, 8:43pm

What kind of field is high_card_field? Is it an analyzed, string field? Those still use field data, which must be loaded into memory. If the field hasn't been used before, the field data structure is cold and populates on first usage, which can have a noticeable impact on latency.

Is it slow on every execution or just the first one?

Could you gist up the profile results somewhere?

mahdouch · August 16, 2016, 3:48am

It's a not-analyzed string field with doc values enabled. The field actually contains b64 encoded snowflake ids (something like "CpbkCTuAAAA"). The cardinality of the field is in the 5M+ ballpark

The query is slow most of the time. If I run it every 5s, it randomly switches back and forth between returning in 50ms and returning in more than 1s.

Here is a gist of the profiler output: https://gist.github.com/mahdibh/24b7783f37c4e6c007dd3029652845dd

colings86 · August 16, 2016, 7:50am

This came up in a Github issue recently, maybe that thread will be helpful to you? https://github.com/elastic/elasticsearch/issues/19780

mahdouch · September 27, 2016, 11:56pm

That was it, added a map execution_hint and everything became blazingly fast

Topic		Replies	Views
Slow sub-aggregation for low-cardinality field + high-cardinality field Elasticsearch	2	528	November 18, 2022
Very different aggregation speed for similar fields with different cardinality Elasticsearch	2	360	April 2, 2017
Hints to improve performance for numerous aggregations with high cardinalities Elasticsearch	6	763	January 30, 2019
Slow terms aggregations after use of eager_global_ordinals Elasticsearch	6	832	November 9, 2020
Slow Terms Aggregation Elasticsearch	4	1440	May 10, 2019

High cardinality field term agg results in slow query regardless of hit count

Related topics