Cardinality is more than Count. How to achieve the exact uniq count?

jaliph · October 17, 2016, 8:05am

Elasticsearch version: 2.4.0
Plugins installed: []
JVM version: 1.8.0_101
RAM: 16331784kb
Elasticsearch Heap Size:7g

OS version:Red Hat Enterprise Linux Server release 6.6 (Santiago)

Description of the problem including expected versus actual behavior:
We have data set that we have loaded in Elasticsearch.

10 Shards, No replica. Index.compression is set to Best.
Elasticsearch:Port/index*/_count
Count: 527670245

Now this is exact.

We are trying to get the exact unique count of ids in this data set.
{
"size" : 0,
"aggs" : {
"ids" : {
"cardinality" : {
"field" : "id"
}
}
}
}
Cardinality Count is : 531551491
This is exceeding the total count. This value is absurd and cant be accepted.

On increasing the precision_threshold to 22000
Cardinality Count is : 526981997
{
"size" : 0,
"aggs" : {
"ids" : {
"cardinality" : {
"field" : "id",
"precision_threshold" : 22000
}
}
}
}

This is less than the actual unique id, which actually same as equal to the count. In other words, we have made sure all ids are unique and the cardinality should match the count.

Is there any way to get the cardinality absolute exact value? In what way we can achieve this? Is this purely approximate? Can with increase in Memory we can achieve this?

warkolm · October 17, 2016, 8:09am

Have a read of https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-metrics-cardinality-aggregation.html#_counts_are_approximate

jaliph · October 17, 2016, 8:12am

I have already gone through all the documents. Can you suggest some ways to get the unique exact values? The use-case we are using this is a revenue generating model, can't have approx. values to be precise.

warkolm · October 17, 2016, 8:13am

You need to increase the precision_threshold, that's the only way.

jaliph · October 17, 2016, 8:15am

Tried that.. But the values coming are still not exact. The values differ from 22000 to 44000 and reduces to a lower value. Please go through my observation. At an expense of time, I'm ready but i need to get the exact unique values.

warkolm · October 17, 2016, 8:18am

You cannot get it via cardinality.
Maybe a scan/scroll with some client side code would do it.

jprante · October 17, 2016, 8:52am

You would have to write a plugin for true cardinality aggregation.

Topic		Replies	Views
Cardinality Aggregation gives wrong number? Elasticsearch	33	7430	March 7, 2019
Cardinality agg off by one even after precision increase Elasticsearch	2	420	September 30, 2021
Find exact cardinality value Elasticsearch	1	549	January 4, 2018
Is the precision of cardinality aggregation decided by total unique value count or filtered unique value count? Elasticsearch	5	192	January 10, 2024
Cardinality accuracy for very low cardinality Elasticsearch	1	449	November 20, 2018

Cardinality is more than Count. How to achieve the exact uniq count?

Related topics