Cardinality is more than Count. How to achieve the exact uniq count?


(Akash Chandra) #1

Elasticsearch version: 2.4.0
Plugins installed: []
JVM version: 1.8.0_101
RAM: 16331784kb
Elasticsearch Heap Size:7g

OS version:Red Hat Enterprise Linux Server release 6.6 (Santiago)

Description of the problem including expected versus actual behavior:
We have data set that we have loaded in Elasticsearch.

10 Shards, No replica. Index.compression is set to Best.
Elasticsearch:Port/index*/_count
Count: 527670245

Now this is exact.

We are trying to get the exact unique count of ids in this data set.
{
"size" : 0,
"aggs" : {
"ids" : {
"cardinality" : {
"field" : "id"
}
}
}
}
Cardinality Count is : 531551491
This is exceeding the total count. This value is absurd and cant be accepted.

On increasing the precision_threshold to 22000
Cardinality Count is : 526981997
{
"size" : 0,
"aggs" : {
"ids" : {
"cardinality" : {
"field" : "id",
"precision_threshold" : 22000
}
}
}
}

This is less than the actual unique id, which actually same as equal to the count. In other words, we have made sure all ids are unique and the cardinality should match the count.

Is there any way to get the cardinality absolute exact value? In what way we can achieve this? Is this purely approximate? Can with increase in Memory we can achieve this?


(Mark Walkom) #2

Have a read of https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-metrics-cardinality-aggregation.html#_counts_are_approximate


(Akash Chandra) #3

I have already gone through all the documents. Can you suggest some ways to get the unique exact values? The use-case we are using this is a revenue generating model, can't have approx. values to be precise.


(Mark Walkom) #4

You need to increase the precision_threshold, that's the only way.


(Akash Chandra) #5

Tried that.. But the values coming are still not exact. The values differ from 22000 to 44000 and reduces to a lower value. Please go through my observation. At an expense of time, I'm ready but i need to get the exact unique values.


(Mark Walkom) #6

You cannot get it via cardinality.
Maybe a scan/scroll with some client side code would do it.


(Jörg Prante) #7

You would have to write a plugin for true cardinality aggregation.


(system) #8