Top average list of terms

Mark_Harwood · February 22, 2019, 1:45pm

This area is complex because distributed systems make things complex.
If you can work with a single shard/index that makes analytics more accurate but brings its own problems in terms of scale.
Assuming you stick with multiple shards/indices you'll likely benefit from the shard_size setting covered in the docs. That can increase accuracy which you can check by looking at the doc_count_error_upper_bound value in the results. When it's zero your results should be accurate. The size parameter has a similar effect but with the disadvantage that it increases the number of final results returned - many of which you may be uninterested in.

If you hit the "too many buckets" issue you've pushed size/shard-size too hard and are trying to squeeze too much into your responses. You then need to back up and try break your analysis into multiple requests. That could be done using queries as you suggest or using partitioning in the terms aggregation.

Here's a sketch of a decision process that might help.

Topic		Replies	Views
Aggregation result size Elasticsearch	3	587	July 6, 2017
Terms agregation Elasticsearch	1	338	May 15, 2020
Elasticsearch term aggregations : use case for a single field Elasticsearch	3	440	August 12, 2020
Unusual aggregations size behaviour Elasticsearch	5	825	July 20, 2017
How does elasticsearch return buckets when aggs by sum is used? Elasticsearch	3	325	September 6, 2021

Top average list of terms

Related topics