Cardinality and value_count aggr values are 200-500% off

kievbs · February 21, 2017, 1:00am

I have the really weird issue with ES 2.4. Really appreciate if someone would explain me how to fix it.

I have an index with one unique field 'some_id' inside - and the value of that field is used as object ID.

Total count of records is 756,451 - and all field values are unique (since used in doc ID).

Now I am running cardinality and value_count aggr (see request below).
value_count gives me 3,782,132 -- it's 5x more than the total count!!
cardinality - 1,598,725 -- 2x more than total count

We store UUID in this some_id field, so values look like '06e58e84-e8ad-4d5f-be00-17f2b18ff668' etc (all unique).

Any idea why it happens? I realize that cardinality and value_count are approx counts - but I didn't expect it has 200-500% error rate!!!

Thank you,

{
"query": {
"filtered": {
"query": {
"match_all": []
}
}
},
"aggs": {
"unique_dev_count": {
"cardinality": {
"field": "some_id"
}
},
"dev_count": {
"value_count": {
"field": "some_id"
}
}
}
}

colings86 · February 21, 2017, 9:25am

You are right that the cardinality aggregation is approximate but the value_count aggregation is not approximate. What is the mapping for your some_id field? My initial thought is that the field is an analyzed string field. This would mean that your UUID is being split up into multiple tokens on the - characters and this is causing the over counting your are seeing (because each document then has 5 values since your UUDI has 5 parts). If you change your some_id field to be "index": "not_analyzed" I think you'll see the results you expect.

Hope that helps

kievbs · February 21, 2017, 4:42pm

Thanks for your response. You're right - it had wrong mapping! I checked mapping first thing but looks like I didn't use correct env config. After fixing mapping and reindexing all good!

colings86 · February 21, 2017, 4:57pm

Glad to hear you fixed it

system · March 21, 2017, 4:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cardinality Aggregation gives wrong number? Elasticsearch	33	7429	March 7, 2019
Cardinality agg off by one even after precision increase Elasticsearch	2	420	September 30, 2021
Get number of unique values in a field Elasticsearch	3	1030	July 6, 2017
Nested cardinality values way off with filter? Elasticsearch	3	1801	July 6, 2017
Cardinality is more than Count. How to achieve the exact uniq count? Elasticsearch	7	2203	July 5, 2017

Cardinality and value_count aggr values are 200-500% off

Related topics