I'm working on Kibana visualizations at my current project at work and we have found an issue with this unique count metric.
We already know that this metric is only an approximation and it doesn't get an exact value. But we still need to have an exact one.
Could you recommend any workaround to this problem? Now I will give more details about the index and the data are stored there.
Let's say we have an index 'registrations' where we keep persons (somehow connected with football) registrations. Every single person can be registered in many organizations in a different sport.
Robert Lewandowski (RL) plays Football in Bayern Monachium (BM) as a Professional player
Robert Lewandowski (RL) also plays Futsal in Bayern Monachium (BM) as an Amateur
Christiano Ronaldo (CR7) plays Football in Real Madrid (RM) as a Professional player
Lionel Messi plays Football (LM) in FC Barcelona (FCB) as a Professional player
In 'registrations' index we keep ids of Clubs and Persons so it looks moreless like that:
personId | organizationId | discipline | level
RL | BM | Football | Professional
RL | BM | Futsal | Amateur
CR7 | RM | Football | Professional
LM | FCB | Football | Professional
Then we would like to know for example:
- How many players are registered in total - in this case should be 3
- How many players are registered in specified organizations (grouping)
BM -> 1
RM -> 1
FCB -> 1
In Kibana we have single dasboard with a few visualisations which are based on same index so that we can use filtering applied to all of them.
Do you have any idea how to workaround this?
Looking forward for help,
You can use a Metric visualization and just use the "count" metric for this.
There are many ways to do this, generally in most visualizations, you can:
- use "Unique Count" on the
personId field as the metric
- use a terms aggregation on the
organizationId field for the X-Axis (or split rows in a table visualization).
Thanks for a reply.
Actually count metric will return 4 but i want to be 3. In total we have 4 registrations but as soon RL is has two registrations we have 3 unique players registered in total.
Yes, having terms aggregation on organizationId will do the job but Unique Count metric will return an approximation, but we need an exact value.
The Unique Count metric aggregation is a Cardinality aggregation in Elasticsearch, which has an option to control the threshold: Cardinality aggregation | Elasticsearch Guide [7.15] | Elastic
You can set the threshold value in Kibana using the JSON input:
Yes, that's exactly the one we are using now - Unique Count metric. Well we could potentially use this threshold but this still gives an approximation - not an exact value. In other words, how we can get the exact unique count value?
In Elasticsearch there isn't a separate kind of metric aggregation for exact unique count, there's only cardinality. There's also the terms aggregation, which is a bucket aggregation and you can get the count of each term.
By controlling the threshold value of the cardinality aggregation, you are spending more memory for the cost of accuracy, so you can make that as high as you want (refer to the documentation for more details - there is a limit of 40,000), but just watch out for memory usage issues.
I'm working on the same project as Piotr does. I read your answer and I am already prepared to say to our client that the filtering on their dashboard will not work. However, still hoping that we are having some kind of misunderstanding here, I would like to have a 100% confirmation that Elastic does not provide any solution to the issue presented.
The reason I'm pushing this is that I feel that we have a simple requirement (filtering the whole dashboard based on OrganizationID) for a super simple data model. There are 2 one-to-many relationships and in the SQL world it would take me no time to have a dashboard based on FK's and distinct count up and running. I would find it really surprising if we were the first team to contact you with such a need. This makes me hope, as mentioned in the beginning, that a solution/workaround exists, but we were perhaps just not clear enough in explaining the problem.
Many thanks for coming back to us again on this!
Solving this problem in an SQL environment will be very different than the best way to solve it in Elasticsearch, because Elasticsearch is geared for document storage where a document has all the fields of information that are useful for analysis, versus SQL where entities are arranged in their tables and reference each other with foreign keys.
Please take a look at an article in our documentation titled "Modeling Your Data": Modeling Your Data | Elasticsearch: The Definitive Guide [2.x] | Elastic - hopefully it will give you some ideas of how the data can be arranged so that a simple filter can give you the quick and precise answer that you need.
Also, since we're getting more into capabilities of Elasticsearch and how to model the data, feel free to open a thread under the Elasticsearch topic and if there is a misunderstanding on my part, I apologize if that's the case, but you may get a better answer than here in the Kibana topic.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.