We noticed some inconsistencies when using filter aggregations in combination with includes/excludes.
Specifying an include term which results in a doc_count of 0 is only returned as bucket if the overall doc_count is > 0.
Below are two documents of type 'book' which have the fields title, author and narrator. Both documents have the same author.
PUT filter
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"book": {
"properties": {
"title": {
"type": "keyword"
},
"author": {
"type": "keyword"
},
"narrator": {
"type": "keyword"
}
}
}
}
}
Documents:
PUT filter/book/1
{
"title" : "Mango",
"author" : "Winton",
"narrator" : "Moritz"
}
PUT filter/book/2
{
"title" : "Banana",
"author" : "Winton",
"narrator" : "Max"
}
The following is an aggregation on the field "title" specifying an include of "Mango" setting "min_doc_count: 0" to include buckets with no matching documents.
The query will match book 2.
The subaggregation on field title is performed using book 2 only.
POST filter/book/_search
{
"size": 0,
"aggregations": {
"titleIncluded": {
"filter": {
"bool": {
"must": [
{
"terms": {
"title": [
"Banana"
]
}
},
{
"bool": {
"must_not": [
{
"terms": {
"narrator": [
"Moritz"
]
}
}
]
}
}
]
}
},
"aggregations": {
"titleSubAggregation": {
"terms": {
"field": "title",
"min_doc_count": 0,
"include": [
"Mango"
]
}
}
}
}
}
}
The subaggregation results in a bucket for "Mango" with doc_count being 0.
Result:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"titleIncluded": {
"doc_count": 1,
"titleSubAggregation": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Mango",
"doc_count": 0
}
]
}
}
}
}
The only thing we changed below is the query which does not match any of the two documents.
POST filter/book/_search
{
"size": 0,
"aggregations": {
"titleIncluded": {
"filter": {
"bool": {
"must": [
{
"terms": {
"narrator": [
"Max"
]
}
},
{
"bool": {
"must_not": [
{
"terms": {
"title": [
"Banana"
]
}
}
]
}
}
]
}
},
"aggregations": {
"titleSubAggregation": {
"terms": {
"field": "title",
"min_doc_count": 0,
"include": [
"Mango"
]
}
}
}
}
}
}
As you can see in the result there is no bucket at all.
Result:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"titleIncluded": {
"doc_count": 0,
"titleSubAggregation": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
}
}
It seems that in the second case no aggregation is performed because the query results in 0 documents.
Is this intended or is it a bug? For me it seems like an inconsistent behaviour when using include inside filter aggregations.