Explicit Unique Count for Business Intelligence tasks: no chance?


#1

I have come here from SQL world and have fallen into Unique Count trap called cardinality.
Very disappointed that unique count here is not the same as count distinct in SQL.
https://www.elastic.co/guide/en/elasticsearch/guide/current/cardinality.html

Is there any chance to get the explicit unique count in Kibana? Any approximations are not acceptable for my task.

Some explanation: I have built SQL data mart with large amount of fields i.e. have joined a lot of SQL entities to get all desired data and all labels in one SQL data mart. And have got some expected multiplication of rows due to the ID of some entity can appear in my mart in different rows. And have transferred these data into one ES index via Logstash. Then, unique count by ClientID field in Kibana, comparison with count (distinct ClientID) executed on SQL mart, and bad mood for at least the rest of the day as a result.


(Thomas Neirynck) #2

hi @Mikhail,

imho there's no quick shortcut for this. This limitation is informed by Elasticsearch's distributed architecture and the choice for using the HyperLogLog algorithm to compute the unique count.

I would open an enhancement request in Elasticsearch repo for this: https://github.com/elastic/elasticsearch/issues/new


#3

hi @thomasneirynck,

Thank you for support on this. I have created feature request there:

Hope it will be implemented.


#4

So, request for this feature is Closed on github. This will not be implemented.
This means that if you want to build some BI system using ELK and this system requires explicit counts then IMHO you have three options:

  1. Use existing Unique count metric and admit approximation.

  2. Organize your source data in the way when you will have no multiplication of the same EntityID within one source data mart where you whant to count unique values. But in that case, what about relations between indicies (joins) on Kibana level? I don't know how to reach this. If someone knows then please suggest.

  3. Do not use ELK for your BI system.

I will go forward with option 1 but will not use Kibana for counting where explicit unique count is required.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.