Does the term aggregation also return approximate value when doing sum/avg aggregation?

I understand that term aggregation will return approximate result regarding the doc count.
But how about the metric aggregation after term aggregation? For example, sum/avg.
Will these also be approximate too?

When the doc count is approximate (because of high-cardinality fields on distributed data) then any child aggregations (sum, avg etc) will be looking at an incomplete set of docs and so the values will be inaccurate.
To overcome this you can do an initial query to find the top terms and then run a second query with terms agg using include parameter that lists these values in an array and have the child aggs computing sum, avg etc. These values will be correct for those terms.

The doc also said using "shard_size" will improve the accuracy, but did not giving any example for this parameter.
Do you know how to use the shard_size ? Can you give some example please? Thanks

Here's an example:

GET myindex/_search
  "size": 0,
  "aggs": {
	"my_agg": {
	  "terms": {
		"field": "my_field",
		"size": 10,
		"shard_size": 100

Pay attention to the doc_count_error_upper_bound in the results - when that's zero you know you're accurate.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.