I'm wondering, why don't you use the value sum_other_doc_count from the very first request and sending a special must_not query for that?
I'm asking because I don't get the other bucket (with aggs "size=1") when data is like that:
PUT test/doc/1
{
"filterNames": [
"filter1"
]
}
PUT test/doc/2
{
"filterNames": [
"filter1",
"filter2"
]
}
I expect the "other" bucket would contain one doc (with "filter2", which is correctly showed by sum_other_doc_count), but because of the second "other-filter" query the "filter2" document gets filtered out and no "other" bucket gets displayed
Do you think, it is the expected behaviour?
Thanks!
P.S. I found an explanation how the "other" bucket works, but it is still not clear, why it has been implemented like that.
let me shortly clarify a bit more in detail to your questions.
Why can't we use sum_other_doc_count?
This would only work for "Count" but for no other metric aggregation the user wants to use. Since our Other Bucket should also work with all the other metric aggregations, we need to do this two query method, to actually calculate the same metrics for all "other" documents.
What documents will be found
To your question if we would expect your doc 2 to be filtered out in Other Bucket: Currently yes, that's expected behavior. But to be honest the way Array values work in Elasticsearch might not always be what you would be expecting depending on your use-case. Sometimes it actually might be the solution you want. So we are doing the "default" ES behavior on filtering out those values, also since I don't currently see any other proper solution (that would work with every metric).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.