Cardinality and value_count aggr values are 200-500% off

I have the really weird issue with ES 2.4. Really appreciate if someone would explain me how to fix it.

I have an index with one unique field 'some_id' inside - and the value of that field is used as object ID.

Total count of records is 756,451 - and all field values are unique (since used in doc ID).

Now I am running cardinality and value_count aggr (see request below).
value_count gives me 3,782,132 -- it's 5x more than the total count!!
cardinality - 1,598,725 -- 2x more than total count

We store UUID in this some_id field, so values look like '06e58e84-e8ad-4d5f-be00-17f2b18ff668' etc (all unique).

Any idea why it happens? I realize that cardinality and value_count are approx counts - but I didn't expect it has 200-500% error rate!!!

Thank you,

{
"query": {
"filtered": {
"query": {
"match_all": []
}
}
},
"aggs": {
"unique_dev_count": {
"cardinality": {
"field": "some_id"
}
},
"dev_count": {
"value_count": {
"field": "some_id"
}
}
}
}

You are right that the cardinality aggregation is approximate but the value_count aggregation is not approximate. What is the mapping for your some_id field? My initial thought is that the field is an analyzed string field. This would mean that your UUID is being split up into multiple tokens on the - characters and this is causing the over counting your are seeing (because each document then has 5 values since your UUDI has 5 parts). If you change your some_id field to be "index": "not_analyzed" I think you'll see the results you expect.

Hope that helps

1 Like

Thanks for your response. You're right - it had wrong mapping! I checked mapping first thing but looks like I didn't use correct env config. After fixing mapping and reindexing all good!

Glad to hear you fixed it :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.