How to find a unique count exact value

piotr.dela · October 24, 2017, 8:54am

Hello,

I'm working on Kibana visualizations at my current project at work and we have found an issue with this unique count metric.
We already know that this metric is only an approximation and it doesn't get an exact value. But we still need to have an exact one.
Could you recommend any workaround to this problem? Now I will give more details about the index and the data are stored there.

Let's say we have an index 'registrations' where we keep persons (somehow connected with football) registrations. Every single person can be registered in many organizations in a different sport.
For example:

Robert Lewandowski (RL) plays Football in Bayern Monachium (BM) as a Professional player
Robert Lewandowski (RL) also plays Futsal in Bayern Monachium (BM) as an Amateur
Christiano Ronaldo (CR7) plays Football in Real Madrid (RM) as a Professional player
Lionel Messi plays Football (LM) in FC Barcelona (FCB) as a Professional player

In 'registrations' index we keep ids of Clubs and Persons so it looks moreless like that:

personId | organizationId | discipline | level

Then we would like to know for example:

How many players are registered in total - in this case should be 3
How many players are registered in specified organizations (grouping)
BM -> 1
RM -> 1
FCB -> 1

In Kibana we have single dasboard with a few visualisations which are based on same index so that we can use filtering applied to all of them.

Do you have any idea how to workaround this?

Looking forward for help,
Piotr

tsullivan · October 24, 2017, 8:27pm

You can use a Metric visualization and just use the "count" metric for this.

There are many ways to do this, generally in most visualizations, you can:

use "Unique Count" on the personId field as the metric
use a terms aggregation on the organizationId field for the X-Axis (or split rows in a table visualization).

piotr.dela · October 25, 2017, 7:36am

Thanks for a reply.

Actually count metric will return 4 but i want to be 3. In total we have 4 registrations but as soon RL is has two registrations we have 3 unique players registered in total.

Yes, having terms aggregation on organizationId will do the job but Unique Count metric will return an approximation, but we need an exact value.

tsullivan · October 25, 2017, 5:48pm

The Unique Count metric aggregation is a Cardinality aggregation in Elasticsearch, which has an option to control the threshold: Cardinality aggregation | Elasticsearch Guide [8.11] | Elastic

You can set the threshold value in Kibana using the JSON input:

piotr.dela · October 26, 2017, 7:31am

Yes, that's exactly the one we are using now - Unique Count metric. Well we could potentially use this threshold but this still gives an approximation - not an exact value. In other words, how we can get the exact unique count value?

tsullivan · October 26, 2017, 4:41pm

In Elasticsearch there isn't a separate kind of metric aggregation for exact unique count, there's only cardinality. There's also the terms aggregation, which is a bucket aggregation and you can get the count of each term.

By controlling the threshold value of the cardinality aggregation, you are spending more memory for the cost of accuracy, so you can make that as high as you want (refer to the documentation for more details - there is a limit of 40,000), but just watch out for memory usage issues.

PeKaDe · October 29, 2017, 11:21am

Hi Tim,
I'm working on the same project as Piotr does. I read your answer and I am already prepared to say to our client that the filtering on their dashboard will not work. However, still hoping that we are having some kind of misunderstanding here, I would like to have a 100% confirmation that Elastic does not provide any solution to the issue presented.
The reason I'm pushing this is that I feel that we have a simple requirement (filtering the whole dashboard based on OrganizationID) for a super simple data model. There are 2 one-to-many relationships and in the SQL world it would take me no time to have a dashboard based on FK's and distinct count up and running. I would find it really surprising if we were the first team to contact you with such a need. This makes me hope, as mentioned in the beginning, that a solution/workaround exists, but we were perhaps just not clear enough in explaining the problem.

Many thanks for coming back to us again on this!
PKD
(Piotr Kuc-Dzierżawski)

tsullivan · October 30, 2017, 4:59pm

Solving this problem in an SQL environment will be very different than the best way to solve it in Elasticsearch, because Elasticsearch is geared for document storage where a document has all the fields of information that are useful for analysis, versus SQL where entities are arranged in their tables and reference each other with foreign keys.

Please take a look at an article in our documentation titled "Modeling Your Data": Modeling Your Data | Elasticsearch: The Definitive Guide [2.x] | Elastic - hopefully it will give you some ideas of how the data can be arranged so that a simple filter can give you the quick and precise answer that you need.

Also, since we're getting more into capabilities of Elasticsearch and how to model the data, feel free to open a thread under the Elasticsearch topic and if there is a misunderstanding on my part, I apologize if that's the case, but you may get a better answer than here in the Kibana topic.

system · November 27, 2017, 5:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Explicit Unique Count for Business Intelligence tasks: no chance? Kibana	4	641	March 16, 2018
Kibana alternate way to remove duplicates or get precise Unique Count Kibana	3	3301	April 25, 2019
Kibana unique count more that number of records Kibana	5	267	May 8, 2024
I'm so CONFUSE with the UNIQUE COUNT Kibana	2	484	October 10, 2019
How to improve the accuracy of Unique Count metric in Kibana Kibana	4	1724	July 27, 2020

How to find a unique count exact value

Related topics