Composite Aggregation bucket sorting by max score

Hopefully I've overlooking something easy. I recently replaced the TermsAggregation with a Composite agg and I can't get the bucket sorting to work.
With the terms agg there was an order directly in the agg builder that let me order via a max score script:

TermsAggregationBuilder groupByAggBuilder = AggregationBuilders
                                    .terms(agg.getName())
                                    .field(agg.getFieldName() )
                                    .size( aggSize )
                                    .order( BucketOrder.aggregation(SORT_TOP_HIT, false) ) //SORT by calculated score below
                                    .shardSize(SHARD_SIZE);
...

aggregationBuilder.subAggregation(AggregationBuilders.max(SORT_TOP_HIT).script(new Script(SCORE))); ////Sorting buckets by their max score

However since Composite aggregations are built differently I can't get it to sort the buckets by the max score. I want to sort the buckets by the max score of the results inside each bucket. Below will return the buckets and their max score unsorted. What am I missing?

List<CompositeValuesSourceBuilder<?>> sources = new ArrayList<>();
TermsValuesSourceBuilder groupByTerm = new TermsValuesSourceBuilder(BucketName)
				.field(groupByFieldName);
				sources.add(groupByTerm);
CompositeAggregationBuilder compositeAggregationBuilder =
				new CompositeAggregationBuilder(agg.getName(), sources);  //No sort order here
...
    aggregationBuilder.subAggregation( AggregationBuilders.max(SORT_TOP_HIT).script(new Script(SCORE)) ); //max Score per bucket I want to sort buckets by

The way composite agg allows you to page through large lists of terms in a distributed system is for shards to use a common sort order that requires no detailed collaboration between shards. That order is the common key used for grouping and client requests can pass a single “after” value to advance all shards to the next page. Any other sort order (like score) runs the risk of shards disagreeing on what the next-best key to aggregate might be.

Ok thanks, that makes sense but also leaves me in a bind. I was hoping to replace the existing Terms Agg with the Composite in order to allow pagination on the bucketed results. If the Composite bucketed results can't be ranked by score, that's a major drawback.

If the goal is to return bucketed results by score, is there any effective way of paginating other than re-issuing the request, increasing the bucket size each time and throwing away the results from the previous pages? (Problem of course being getting ever larger response sizes as the user paginates)

Yes, terms partitioning can be used to ensure each shard is evaluating the same subset of terms in any one request in order to avoid blowing up memory

Thank you Mark. In such a case do you know of a recommended strategy for how to determine an overall score across partitions?

You only have controls of scores within a single partition.
I have a wizard that walks through the various grouping options and for each suggests use cases where the technique works well.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.