Filter based on the doc_count with aggregations

Hi,

you can use the bucket_selector pipeline aggregation for this kind of filtering. In your case, the following query:

GET /cars/transactions/_search
{
   "size": 0,
   "aggs": {
      "popular_colors": {
         "terms": {
            "field": "color"
         },
         "aggs": {
            "my_filter": {
               "bucket_selector": {
                  "buckets_path": {
                     "the_doc_count": "_count"
                  },
                  "script": "the_doc_count == 2"
               }
            }
         }
      }
   }
}

Should only filter out the buckets with "doc_count" : 2. However, be aware that Pipeline aggregations work on the outputs produced from other aggregations, so the overall amount of work that needs to be done to calculate the initial doc_counts will be the same. Since the script parts needs to be executed for each input bucket, the opetation might potentially be slow for high cardinality fields (as in thousands of thousands of terms), but it should work well for relatively low cardinality fields (like colors, as in this case).

2 Likes