I am trying to write a query that filters the results of an aggregation by min_doc_count. I've tried different strategies but the output is not what I need.
If I understand the logic correctly, if the aggregation counts less than the value assigned to min_doc_count, the output will show an empty bucket. But is there a way of writing the query where the output only shows the buckets where the count is more than or equal to the value assigned to min_doc_count?
Here's a query where I set min_doc_count to the value 10:
GET /my_index/_search
{"size": 0,
"aggs" : {
"group_by_instanceid" :{
"terms" : { "field" : "instanceid.keyword"},
"aggs": {
"group_by_hostname": {
"terms": {"field": "hostname.keyword"},
"aggs" : {
"group_by_alertkey" :{
"terms" : { "field" : "alertkey.keyword", "min_doc_count": 10}}
}
}
}
}
}
}
And here is part of the output:
{
"took": 1798,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 5529854,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_instanceid": {
"doc_count_error_upper_bound": 29711,
"sum_other_doc_count": 4440543,
"buckets": [
{
"key": "71",
"doc_count": 79174,
"group_by_hostname": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 3,
"buckets": [
{
"key": "a1",
"doc_count": 40743,
"group_by_alertkey": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "b1",
"doc_count": 40743
}
]
}
},
{
"key": "a2",
"doc_count": 38417,
"group_by_alertkey": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "b2",
"doc_count": 38417
}
]
}
},
{
"key": "a3",
"doc_count": 2,
"group_by_alertkey": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
},
{
"key": "a4",
"doc_count": 2,
"group_by_alertkey": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [}
The output shows certain counts where doc_count is less than 10 followed by an empty bucket (e.g. hostnames a3 and a4) and I would like the output not to show the empty buckets (i.e. the outsput should only give me information like the following, instanceid 71 has 79174 docs, hostname a1 40743 docs with alertkey b1 and hostname a2 has 38417 documents with alertkey b2 ). It's perhaps worth adding that I get a similar output when I try to achieve this goal using bucket selector:
GET /my_index/_search
{"size": 0,
"aggs" : {
"group_by_instanceid" :{
"terms" : { "field" : "instanceid.keyword"},
"aggs" : {
"group_by_alertkey" :{
"terms" : { "field" : "alertkey.keyword"},
"aggs": {
"group_by_": {
"terms": {
"field": "hostname.keyword"},
"aggs" : {
"hostname_count" : {
"value_count": {"field" : "hostname.keyword"}},
"hostname_filter":{
"bucket_selector": {
"buckets_path": {"hname":"hostname_count"},
"script": "params.hname > 10"}}
}
}
}
}
}
}
}
}