How do we find the number unique dates that a field appears?

kenneth_wang · January 9, 2021, 5:29pm

Hello,

For each log entry, I have "user identifier" and "date" fields. I would like to find out the number of unique dates that a "user identifier" appears in.

Eg. For a given user identifier "1234567", if it appeared in different log entries of 5 different dates, then I would like the histogram to show "1234567": 5

If I have a million user identifiers then I would have a million counts to compute.

Is this possible? I ran into an error saying that I exceeded the bucket limit of 10000. I foresee that this problem will not be solved just by increasing the bucket limit..

Thank you.

wylie · January 9, 2021, 6:02pm

You can do this two ways. FYI, the bucket limit is 65k in the most recent versions of the stack.

You can use a Terms aggregation on user ID, and then a cardinality aggregation on the date field. Because dates represent milliseconds since the epoch it will match milliseconds. You could use a scripted cardinality instead if you want to round, but that is the slowest calculation.
You can use an Elasticsearch transform to pre-aggregate your data

kenneth_wang · January 11, 2021, 2:18am

Hi Wylie,

Thanks the first method worked for me. I noticed that the number of user ids that were returned, is dependent on the "size" parameter under the terms aggregrator. And increasing the value will cause me to exceed the bucket limit. Do you recommend the approach of increasing the max bucket limit whenever I require more results?

This is my query:

GET my_app/_search
{ "size": 0, 
 "aggs": {
    "user": {
      "terms": {
        "field": "identifier.keyword",
        "size": 1000
      },
      "aggs": {
        "date": {
          "cardinality": {
            "field": "asctime"
          }
        }
      }
    }
  }
}

wylie · January 11, 2021, 4:12pm

That's how the terms aggregation works: it shows the most frequently-occuring values, but your dataset seems to be equally-weighted, which is not a good fit. The transform option will be able to calculate uniqueness for all values.

kenneth_wang · January 12, 2021, 2:14am

Thanks Wylie, I'll look into it !

system · February 9, 2021, 2:14am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem with unique count and cardinality Elasticsearch	4	208	May 1, 2024
Need help with aggregation and unique counted values Elasticsearch	2	570	July 6, 2017
Count values of one field based on uniqueness with another field Kibana	4	1143	July 6, 2017
Count how many items appear once and only once Kibana	3	890	December 14, 2017
Using cardinality on a field after bucketing Elasticsearch	1	437	December 7, 2017

How do we find the number unique dates that a field appears?

Related topics