Count emails that have more than 10 documents (buckets count)

jack_daniels00 · June 17, 2022, 6:42am

Hi!
I have documents in elastic like:

{
    "email": "user@example.com",
    "subject": "Email subject",
    "body": "Email body"
}

And I'm trying to count all unique emails that have more than 10 documents.
I can retrieve all such emails by the aggregation query:

GET emails/_search
{
  "size": 0,
  "aggs": {
    "email_with_more_than_10_docs": {
      "terms": {
        "field": "email",
        "min_doc_count": 10
      }
    }
  }
}

And I get the output:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 74,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "email_with_more_than_10_docs" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "user1@example.com",
          "doc_count" : 40
        },
        {
          "key" : "user2@example.com",
          "doc_count" : 14
        },
        {
          "key" : "user3@example.com",
          "doc_count" : 13
        }
      ]
    }
  }
}

But how can I get the number of the unique emails (number of buckets)? In the example of output above that number is 3.

Tomo_M · June 20, 2022, 6:50am

As long as the size parameter of aggregation is larger than the number of emails which meet the criteria, count the length of the buckets array on the client side is a viable option to implement.

If you really need the count in the response, you may use stats_bucket pipeline aggregation. Keep in mind set "size" larger than the number of terms. The "count" value is what you want.

GET kibana_sample_data_flights/_search?filter_path=aggregations.stats
{
  "size":0,
  "aggs": {
    "dest": {
      "terms": {
        "field": "DestAirportID",
        "size": 10
      }
    },
    "stats":{
      "stats_bucket": {
        "buckets_path": "dest>_count"
      }
    }
  }
}

{
  "aggregations" : {
    "stats" : {
      "count" : 10,
      "min" : 305.0,
      "max" : 691.0,
      "avg" : 416.1,
      "sum" : 4161.0
    }
  }
}

GET kibana_sample_data_flights/_search?filter_path=aggregations.stats
{
  "size":0,
  "aggs": {
    "dest": {
      "terms": {
        "field": "DestAirportID",
        "size": 1000
      }
    },
    "stats":{
      "stats_bucket": {
        "buckets_path": "dest>_count"
      }
    }
  }
}

{
  "aggregations" : {
    "stats" : {
      "count" : 156,
      "min" : 1.0,
      "max" : 691.0,
      "avg" : 83.71153846153847,
      "sum" : 13059.0
    }
  }
}

jack_daniels00 · July 13, 2022, 6:42am

But what if there are a lot of documents in elstic? Is there any other solution that takes it into account?

Tomo_M · July 13, 2022, 8:58am

For pagination of aggregation results, Composite aggregation may help you.

Using transform to perform aggregation beforehand might be another option.

system · August 10, 2022, 8:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregegation buckets count Elasticsearch	4	5723	July 6, 2017
Create query with list of users-id Elasticsearch	2	745	July 6, 2017
How to return the count of unique documents by using elasticsearch aggregation Elasticsearch	1	402	July 6, 2017
Get aggregation bucket size with a min_doc_count Elasticsearch	1	652	March 30, 2017
Max_doc_count in terms aggragation Elasticsearch	2	4329	June 7, 2018

Count emails that have more than 10 documents (buckets count)

Related topics