Elasticsearch version: 2.4.0
Plugins installed: []
JVM version: 1.8.0_101
RAM: 16331784kb
Elasticsearch Heap Size:7g
OS version:Red Hat Enterprise Linux Server release 6.6 (Santiago)
Description of the problem including expected versus actual behavior:
We have data set that we have loaded in Elasticsearch.
10 Shards, No replica. Index.compression is set to Best.
Elasticsearch:Port/index*/_count
Count: 527670245
Now this is exact.
We are trying to get the exact unique count of ids in this data set.
{
"size" : 0,
"aggs" : {
"ids" : {
"cardinality" : {
"field" : "id"
}
}
}
}
Cardinality Count is : 531551491
This is exceeding the total count. This value is absurd and cant be accepted.
On increasing the precision_threshold to 22000
Cardinality Count is : 526981997
{
"size" : 0,
"aggs" : {
"ids" : {
"cardinality" : {
"field" : "id",
"precision_threshold" : 22000
}
}
}
}
This is less than the actual unique id, which actually same as equal to the count. In other words, we have made sure all ids are unique and the cardinality should match the count.
Is there any way to get the cardinality absolute exact value? In what way we can achieve this? Is this purely approximate? Can with increase in Memory we can achieve this?