I have a nested field in the ES index. I used the cardinality aggregation to get the total number of buckets and then came up with num_partitions, and size values to fetch the buckets.
E.g. the total number of term buckets = 761
Used num_partitions = 4, size = 200
But this returns me the following number of buckets in the 4 requests = 200, 194, 176, 165 Which sums to 735 < 761.
Now when using num_partitions = 3, size = 300
The 3 requests returned the following number of buckets = 246, 261, 254 Which sums to 761.
The second case did not miss any buckets but still, I would expect it to go like this = 300, 300, 161.
So, the questions are -
- Why did the first choice of num_partitions and size miss 26 buckets?
- Why is not honouring the "size" value in either of the above scenarios?
NOTE -
- ES version = 5.6.1. Upgrading to ES 6 is not possible in near future. Hence can not consider using composite aggregation.
- I have referred to this old question and one of the answer quotes the ES documentation that "The terms aggregation is meant to return the top terms and does not allow pagination." But I still do not understand why does it not return buckets in 300, 300, 161 numbers in case 2 above. Without using partitions, it has always honoured the "size" value.
In case required, here is the aggregation part of the query -
"aggs": {
"nested": {
"path": "software"
},
"aggregations": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"sw.publisher": {
"query": "O",
"slop": 100,
"max_expansions": 50,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"host_sw": {
"terms": {
"field": "sw.hostId",
"size": 200,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_term": "asc"
}
],
"include": {
"partition": 3,
"num_partitions": 4
}
}
},
"sw_host_id.total_count": {
"cardinality": {
"field": "sw.hostId",
"precision_threshold": 20000
}
}
}
}
}
}