Hello Community!
I am using Kibana v 7.17.9
I count unique values for various log variables (Visualize->Metric->unique count). But the number of unique values differs slightly, depending on whether I use Kibana or export the same database as .csv-file and edit it with another statistics program (R, Python).
Example:
Total hits in Kibana 3567 -> unique count of user_id: 3267
Total count in .csv-file/Python 3567 -> unique count of user_id: 3275
Does anyone know reasons why this is happening?
Thanks a lot!
Mario
Hi @Mario_Lie ! Sorry you didn't get a reply here sooner.
Under the hood, Kibana uses Elasticsearch's cardinality aggregation to generate that unique count number. Since Elasticsearch is a distributed data store, computing a true cardinality is difficult and precision requirements need to be balanced with cluster load.
See this technical explanation for specific details, including the actual algorithm ( HyperLogLog++) that is being used.
Does this help?
Hi @drewdaemon! Thanks a lot for your response and the links. That was exactly the information I needed.