During terms aggregation with partition we get sum_other_doc_count > 0 in between partitions

Mark_Harwood1 · December 13, 2022, 3:29pm

Correct. Hash-modulo routing is a simple and efficient way to organise potentially billions of values into (roughly) equal sized groups. Given the groups can vary a little above and below your target size just set the retrieval 'size' setting to something >100 to allow for this overspill. Target-partition-size x 2 (e.g. 100 x 2 = 200) should be more than enough I'd have thought to compensate for hashing variations in partition size.

I notice your example is only sorting the terms by their value and not by anything more complex like a derived sum of sales. In the simpler cases the composite aggregation would be a better way to page through results

Topic		Replies	Views
Elasticsearch terms aggregation with partition does not retuning equal bucket Elasticsearch	6	1355	April 11, 2022
Missing documents when using partitions on a term aggregation Elasticsearch	3	596	July 16, 2020
Elasticsearch terms aggregation with partition does not honor the “size” value Elasticsearch	5	1951	May 25, 2021
How to know the total number of aggregation result buckets Elasticsearch	9	866	May 9, 2019
Using partition returns unexpected result Elasticsearch	4	626	August 20, 2019

During terms aggregation with partition we get sum_other_doc_count > 0 in between partitions

Related topics