Different counts of unique values in Kibana and CSV-export of the raw data

Hello Community!

I am using Kibana v 7.17.9

I count unique values for various log variables (Visualize->Metric->unique count). But the number of unique values differs slightly, depending on whether I use Kibana or export the same database as .csv-file and edit it with another statistics program (R, Python).

Example:

Total hits in Kibana 3567 -> unique count of user_id: 3267

Total count in .csv-file/Python 3567 -> unique count of user_id: 3275

Does anyone know reasons why this is happening?

Thanks a lot!

Mario

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Hi @Mario_Lie ! Sorry you didn't get a reply here sooner.

Under the hood, Kibana uses Elasticsearch's cardinality aggregation to generate that unique count number. Since Elasticsearch is a distributed data store, computing a true cardinality is difficult and precision requirements need to be balanced with cluster load.

See this technical explanation for specific details, including the actual algorithm ( HyperLogLog++) that is being used.

Does this help?

1 Like

Hi @drewdaemon! Thanks a lot for your response and the links. That was exactly the information I needed.