Is the precision of cardinality aggregation decided by total unique value count or filtered unique value count?

henrhoi · December 11, 2023, 4:40pm

Hi,

We have an index with a field containing ~30,000 unique values.
When doing a filtered cardinality aggregation on this field, which should return ~650 unique values, we experience non-deterministic results (±25).

We are using a precision threshold of 10,000. With this configuration counts are expected to be close to accurate under 10,000 unique values, or does this limit apply to the total value count?

Thank you in advance!
Best,
Henrik

BenB196 · December 12, 2023, 12:42am

Hi @henrhoi,

Two questions;

Could you provide which version of Elasticsearch you're using?
Could you provide an example query which you are running?

As noted here there is no "guarantee" of accuracy, and in the bullet point under precision control it subtly mentions:

The precision_threshold options allows to trade memory for accuracy, and defines a unique count below which counts are expected to be close to accurate

Though seeing your query might provide more insight into potential optimizations. You can also look at, Accurate Distinct Count and Values from Elasticsearch. | by Pratik Patil | Medium, which provides some good examples of getting "accurate" counts at the cost of speed.

henrhoi · December 12, 2023, 9:55am

Thank you for the quick response. See my answers below.

We are using version 7.16.2.
Providing an example query:

For context, the split below is used in a pivot table, without any splits, where the Total-script is used to get this value under a "Total" bucket.

{
    "aggs": {
        "column_1": {
            "aggs": {
                "value_524933379": {
                    "cardinality": {
                        "field": "some_field",
                        "precision_threshold": 10000
                    }
                }
            },
            "terms": {
                "missing": -1,
                "order": {
                    "_key": "asc"
                },
                "script": {
                    "lang": "painless",
                    "source": "(('Total').toString())"
                },
                "shard_size": 200,
                "size": 10
            }
        }
    },
    "query": {
        "filter": [],
        "must": [
            {
                "terms": {
                    "another_field": [
                        "SOME VALUE"
                    ]
                }
            }
        ]
    },
    "size": 0,
    "track_total_hits": false
}

Thank you.

BenB196 · December 12, 2023, 11:48am

Overall, the query looks relatively good, two possible suggestions (I'm not 100% confident they'll improve results though).

For your query, try using filter instead of must if you don't need the scoring that must offers.
Try using the execution_hint direct (or play around with the other possible options as well) to see if you get better accuracy.

henrhoi · December 13, 2023, 1:07pm

Thanks, Ben!

I'll try those suggestions.

In general, what behaviour or accuracy should we expect when aggregating on full vs filtered data? Should we expect accurate results when the data is filtered to <1000 unique values?

Best,
Henrik

system · January 10, 2024, 1:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch cardinality aggregation not returning accurate numbers despite using precision_threshold Elasticsearch	4	511	December 6, 2021
Cardinality, precision, and Top 10 Elasticsearch	1	376	February 20, 2020
Cardinality accuracy for very low cardinality Elasticsearch	1	449	November 20, 2018
Cardinality Aggregation gives wrong number? Elasticsearch	33	7432	March 7, 2019
Cardinality agg off by one even after precision increase Elasticsearch	2	420	September 30, 2021

Is the precision of cardinality aggregation decided by total unique value count or filtered unique value count?

Related topics