How do we find the number unique dates that a field appears?

Hello,

For each log entry, I have "user identifier" and "date" fields. I would like to find out the number of unique dates that a "user identifier" appears in.

Eg. For a given user identifier "1234567", if it appeared in different log entries of 5 different dates, then I would like the histogram to show "1234567": 5

If I have a million user identifiers then I would have a million counts to compute.

Is this possible? I ran into an error saying that I exceeded the bucket limit of 10000. I foresee that this problem will not be solved just by increasing the bucket limit..

Thank you.

You can do this two ways. FYI, the bucket limit is 65k in the most recent versions of the stack.

  1. You can use a Terms aggregation on user ID, and then a cardinality aggregation on the date field. Because dates represent milliseconds since the epoch it will match milliseconds. You could use a scripted cardinality instead if you want to round, but that is the slowest calculation.

  2. You can use an Elasticsearch transform to pre-aggregate your data

Hi Wylie,

Thanks the first method worked for me. I noticed that the number of user ids that were returned, is dependent on the "size" parameter under the terms aggregrator. And increasing the value will cause me to exceed the bucket limit. Do you recommend the approach of increasing the max bucket limit whenever I require more results?

This is my query:

GET my_app/_search
{ "size": 0, 
 "aggs": {
    "user": {
      "terms": {
        "field": "identifier.keyword",
        "size": 1000
      },
      "aggs": {
        "date": {
          "cardinality": {
            "field": "asctime"
          }
        }
      }
    }
  }
}

That's how the terms aggregation works: it shows the most frequently-occuring values, but your dataset seems to be equally-weighted, which is not a good fit. The transform option will be able to calculate uniqueness for all values.

Thanks Wylie, I'll look into it !