Cardinality Limitation Work Around

edang · July 27, 2023, 2:44pm

Hi All,

With my data set I have seen a mismatch of data between ELK and my DB. For my purpose, I have used the cardinality aggregation to count the unique ids of a field but ran into some issues. The issues comes from the cardinality aggregation, specifically the precision_threshold is a default of 3,000. Anything over a precision_threshold of 3,000 will have an approximate value which is not what I intend to do with my data.

I recognize that the precision_threshold can be increased to an upper limit of 40,000 but that is far too low for my dataset (1 mil +~). Going through the logstash filter with the fingerprint function, I am able to create a new field with either 1 or 0 and using the summation aggregation. However, my problem comes from additional filters I need to sum using the same approach (using Sum instead of Cardinality).

My intent is to use ruby to make arrays of specific fields that all fall under a specific document id. I would appreciate some insight on this matter if anyone has tried a similar approach.

Thank you.

system · August 24, 2023, 2:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Queries regarding precision_threshold setting in cardinality aggregation Elasticsearch	3	662	November 24, 2018
Cardinality accuracy for very low cardinality Elasticsearch	1	449	November 20, 2018
Is the precision of cardinality aggregation decided by total unique value count or filtered unique value count? Elasticsearch	5	192	January 10, 2024
Cardinality, precision, and Top 10 Elasticsearch	1	376	February 20, 2020
Different result for cardinality aggregation [6.3.0] in Python 3.6 plugin Elasticsearch	1	503	October 15, 2018

Cardinality Limitation Work Around

Related topics