I have the really weird issue with ES 2.4. Really appreciate if someone would explain me how to fix it.
I have an index with one unique field 'some_id' inside - and the value of that field is used as object ID.
Total count of records is 756,451 - and all field values are unique (since used in doc ID).
Now I am running cardinality and value_count aggr (see request below).
value_count gives me 3,782,132 -- it's 5x more than the total count!!
cardinality - 1,598,725 -- 2x more than total count
We store UUID in this some_id field, so values look like '06e58e84-e8ad-4d5f-be00-17f2b18ff668' etc (all unique).
Any idea why it happens? I realize that cardinality and value_count are approx counts - but I didn't expect it has 200-500% error rate!!!
You are right that the cardinality aggregation is approximate but the value_count aggregation is not approximate. What is the mapping for your some_id field? My initial thought is that the field is an analyzed string field. This would mean that your UUID is being split up into multiple tokens on the - characters and this is causing the over counting your are seeing (because each document then has 5 values since your UUDI has 5 parts). If you change your some_id field to be "index": "not_analyzed" I think you'll see the results you expect.
Thanks for your response. You're right - it had wrong mapping! I checked mapping first thing but looks like I didn't use correct env config. After fixing mapping and reindexing all good!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.