Given an index with a field that has high cardinality (1M+ distinct values), the following query has 0 hits and takes a ton of time to return even though the number of hits is 0.
If I add "execution_hint":"map" to the aggregation then it returns pretty quickly.
Can someone explain the behavior ? Is ES doing some expensive work before actually running the query ?
Tried this with both 1.7 and 2.3 and see the same behavior. Btw, the profiling in 2.3 (which is pretty neat) doesn't explain where most of the time is being spent.
What kind of field is high_card_field? Is it an analyzed, string field? Those still use field data, which must be loaded into memory. If the field hasn't been used before, the field data structure is cold and populates on first usage, which can have a noticeable impact on latency.
Is it slow on every execution or just the first one?
It's a not-analyzed string field with doc values enabled. The field actually contains b64 encoded snowflake ids (something like "CpbkCTuAAAA"). The cardinality of the field is in the 5M+ ballpark
The query is slow most of the time. If I run it every 5s, it randomly switches back and forth between returning in 50ms and returning in more than 1s.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.