We are running ES 6.8 and move some documents to another ES Cluster with a more recent version. We also transform the documents during that process.
We need to group documents by a field, so we do this via a terms aggregation.
Because there are often more than the search.max_buckets settings (we can not set it higher) we use the partition option.
I understand that this is no pagination but for what we want to achieve it is fine that the partitions have different number of results.
In order to determine how many partitions we need we first run a cardinality aggregation on that field, like suggested in the docs.
We will then fetch our data with a size of 100, i.e. max 100 buckets for the aggregation per search.
What we see from the results is that sum_other_doc_count
is sometimes greater than 0. And we wonder what that means in context of using partitioning.
Our queries
Determine number of partitions
GET the-index/_search
{
"aggs": {
"count": {
"cardinality": {
"field": "groupField"
}
}
},
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"someField": "someValue"
}
}
]
}
}
}
},
"size": 0
}
We now get a number back and divide that bei 100 (our aggregation size setting) so we roughly know how many partitions we need.
Let's say cardinality
is 157300 we would set num_partitions
to 1573
.
Fetch buckets
GET the-index/_search
{
"aggs": {
"byGroupField": {
"terms": {
"execution_hint": "map",
"field": "groupField",
"size": 100,
"include": {
"partition": 123,
"num_partitions": 1573
}
}
}
},
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"someField": "someValue"
}
}
]
}
}
}
},
"size": 0
}
What we observed now is sum_other_doc_count
being greater than 0 for some partitions. Even when we set num_partitions
way higher than necessary (in respect to the size setting) we will get sum_other_doc_count
greater 0 for many partitions.
What we expected was that sum_other_doc_count
is greater 0 only for the last partition in case our size
for the aggregation is too low.
Can somebody point me to some docs or post an explanation why we get sum_other_doc_count
greater 0 for many partitions?
Thank you very much