Count emails that have more than 10 documents (buckets count)

Hi!
I have documents in elastic like:

{
    "email": "user@example.com",
    "subject": "Email subject",
    "body": "Email body"
}

And I'm trying to count all unique emails that have more than 10 documents.
I can retrieve all such emails by the aggregation query:

GET emails/_search
{
  "size": 0,
  "aggs": {
    "email_with_more_than_10_docs": {
      "terms": {
        "field": "email",
        "min_doc_count": 10
      }
    }
  }
}

And I get the output:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 74,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "email_with_more_than_10_docs" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "user1@example.com",
          "doc_count" : 40
        },
        {
          "key" : "user2@example.com",
          "doc_count" : 14
        },
        {
          "key" : "user3@example.com",
          "doc_count" : 13
        }
      ]
    }
  }
}

But how can I get the number of the unique emails (number of buckets)? In the example of output above that number is 3.

As long as the size parameter of aggregation is larger than the number of emails which meet the criteria, count the length of the buckets array on the client side is a viable option to implement.

If you really need the count in the response, you may use stats_bucket pipeline aggregation. Keep in mind set "size" larger than the number of terms. The "count" value is what you want.

GET kibana_sample_data_flights/_search?filter_path=aggregations.stats
{
  "size":0,
  "aggs": {
    "dest": {
      "terms": {
        "field": "DestAirportID",
        "size": 10
      }
    },
    "stats":{
      "stats_bucket": {
        "buckets_path": "dest>_count"
      }
    }
  }
}
{
  "aggregations" : {
    "stats" : {
      "count" : 10,
      "min" : 305.0,
      "max" : 691.0,
      "avg" : 416.1,
      "sum" : 4161.0
    }
  }
}
GET kibana_sample_data_flights/_search?filter_path=aggregations.stats
{
  "size":0,
  "aggs": {
    "dest": {
      "terms": {
        "field": "DestAirportID",
        "size": 1000
      }
    },
    "stats":{
      "stats_bucket": {
        "buckets_path": "dest>_count"
      }
    }
  }
}
{
  "aggregations" : {
    "stats" : {
      "count" : 156,
      "min" : 1.0,
      "max" : 691.0,
      "avg" : 83.71153846153847,
      "sum" : 13059.0
    }
  }
}
1 Like

But what if there are a lot of documents in elstic? Is there any other solution that takes it into account?

For pagination of aggregation results, Composite aggregation may help you.

Using transform to perform aggregation beforehand might be another option.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.