We have an index with a field containing ~30,000 unique values.
When doing a filtered cardinality aggregation on this field, which should return ~650 unique values, we experience non-deterministic results (±25).
We are using a precision threshold of 10,000. With this configuration counts are expected to be close to accurate under 10,000 unique values, or does this limit apply to the total value count?
Could you provide which version of Elasticsearch you're using?
Could you provide an example query which you are running?
As noted here there is no "guarantee" of accuracy, and in the bullet point under precision control it subtly mentions:
The precision_threshold options allows to trade memory for accuracy, and defines a unique count below which counts are expected to be close to accurate
In general, what behaviour or accuracy should we expect when aggregating on full vs filtered data? Should we expect accurate results when the data is filtered to <1000 unique values?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.