Sorting on sum metric sub aggregation


(msd) #1

I have a terms aggregation and an sum metric sub aggregation. I am sorting on the sum metric sub aggregation by using "order" and specifying the sum metric sub aggregation name:

"order" : { "SumMetricSubAggregationName" : "desc" }

I am trying to understand how sorting on the sub aggregation works and how accurate it is when I have millions of documents in multiple shards.

For terms aggregation, when the "size" parameter is specified, it selects the top N terms in each shard and then these are combined to return the final top N terms and their count.

When we sort by metrics sub aggregation, I am trying to understand how does it select terms in each shard. will it collect top N terms ordered by sub aggregation count from each shard? Or it just collects the top N terms in each shard irrespective of the sub aggregation count and then sort them by sub aggregation count?


(Mark Harwood) #2

See shard_size settings for how accuracy can be tuned. It gives you the control to determine how many terms to consider from each shard in interim results rather than the size of the final result. It helps reduce the potential for error at the expense of larger memory usage. You can ask for details on the error margin of a search.

Note that for the sorts of aggregations it looks like you are trying to do it might make sense to consider partitioning your analysis of the data using this new terms agg feature in 5.2,


(msd) #3

Thanks Mark for the response.
I am trying to understand when we sort by metrics sub-aggregation, and use the shard_size (or size parameter) how the terms are selected in each shard.

For example in the following aggregation:

GET /testweightedaggregate/Orders/_search
{
"size": 0,
"aggs": {
"ItemAgg": {
"terms": {
"field": "ItemId",
"size" : "2",
"shard_size": 100
},
"aggs": {
"ItemQuantityAgg": {
"sum": {
"field": "Quantity"
}
}
}
}
}
}

In each shard, will it select the top 100 "ItemId" bucket (irrespective of their sum sub-aggregate on "Quantity") and then sort it by sum sub-aggregate on "Quantity" ? Or in each shard it will actually select the "ItemId" buckets that have top 100 sum sub-aggregation value (even if the "ItemId" count is not in top 100 i.e. the "ItemId" count is less but they have big "Quantity" value) ?


(Mark Harwood) #4

The latter - but obviously shard-local selections are only using shard-local values for quantities. The merged results may understate true global quantities if a shard has not included results for a particular item (e.g. could happen if an item ranks highly on most shards but very poorly on one)


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.