Filter by min doc count

goncalobsantos · July 17, 2019, 11:13am

I am trying to write a query that filters the results of an aggregation by min_doc_count. I've tried different strategies but the output is not what I need.
If I understand the logic correctly, if the aggregation counts less than the value assigned to min_doc_count, the output will show an empty bucket. But is there a way of writing the query where the output only shows the buckets where the count is more than or equal to the value assigned to min_doc_count?

Here's a query where I set min_doc_count to the value 10:

GET /my_index/_search
{"size": 0, 
 "aggs" : {
  "group_by_instanceid" :{
   "terms" : { "field" : "instanceid.keyword"},
      "aggs": {
       "group_by_hostname": {
         "terms": {"field": "hostname.keyword"},
            "aggs" : {
             "group_by_alertkey" :{
              "terms" : { "field" : "alertkey.keyword", "min_doc_count": 10}}
    
     }
    }
   }
  }
 }
}

And here is part of the output:

{
  "took": 1798,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5529854,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_instanceid": {
      "doc_count_error_upper_bound": 29711,
      "sum_other_doc_count": 4440543,
      "buckets": [
        {
          "key": "71",
          "doc_count": 79174,
          "group_by_hostname": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 3,
            "buckets": [
              {
                "key": "a1",
                "doc_count": 40743,
                "group_by_alertkey": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "b1",
                      "doc_count": 40743
                    }
                  ]
                }
              },
              {
                "key": "a2",
                "doc_count": 38417,
                "group_by_alertkey": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "b2",
                      "doc_count": 38417
                    }
                  ]
                }
              },
              {
                "key": "a3",
                "doc_count": 2,
                "group_by_alertkey": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": []
                }
              },
              {
                "key": "a4",
                "doc_count": 2,
                "group_by_alertkey": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [}

The output shows certain counts where doc_count is less than 10 followed by an empty bucket (e.g. hostnames a3 and a4) and I would like the output not to show the empty buckets (i.e. the outsput should only give me information like the following, instanceid 71 has 79174 docs, hostname a1 40743 docs with alertkey b1 and hostname a2 has 38417 documents with alertkey b2 ). It's perhaps worth adding that I get a similar output when I try to achieve this goal using bucket selector:

GET /my_index/_search
{"size": 0, 
 "aggs" : {
  "group_by_instanceid" :{
   "terms" : { "field" : "instanceid.keyword"},
    "aggs" : {
     "group_by_alertkey" :{
      "terms" : { "field" : "alertkey.keyword"},
        "aggs": {
          "group_by_": {
            "terms": {
              "field": "hostname.keyword"}, 
                "aggs" : {
                 "hostname_count" : {
                  "value_count": {"field" : "hostname.keyword"}},
	                  "hostname_filter":{
		                 "bucket_selector": {
                      "buckets_path": {"hname":"hostname_count"},
                      "script": "params.hname > 10"}}
                        }
             }
            }
           }
          } 
        }
      }
    }

Horia_Sibinescu · July 19, 2019, 7:34pm

Hey,

Try adding something like this to your query:

"order": {
"_count": "asc"
}

This should order everything from the lowest number of documents, in your case 10.

system · August 16, 2019, 7:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES filter composite aggregation result by doc_count Kibana	4	821	September 6, 2022
Group by Count equals value Elasticsearch	2	217	April 28, 2022
Questions about aggregation min_doc_count = 0 Elasticsearch	3	1790	July 6, 2017
Filter based on the doc_count with aggregations (2) Elasticsearch	2	4901	October 5, 2018
Min_doc_count on lower/lowest level nested aggregation Elasticsearch	1	534	July 6, 2017

Filter by min doc count

Related topics