During terms aggregation with partition we get sum_other_doc_count > 0 in between partitions

Correct. Hash-modulo routing is a simple and efficient way to organise potentially billions of values into (roughly) equal sized groups. Given the groups can vary a little above and below your target size just set the retrieval 'size' setting to something >100 to allow for this overspill. Target-partition-size x 2 (e.g. 100 x 2 = 200) should be more than enough I'd have thought to compensate for hashing variations in partition size.

I notice your example is only sorting the terms by their value and not by anything more complex like a derived sum of sales. In the simpler cases the composite aggregation would be a better way to page through results

1 Like