How to determine the correct size for terms aggregation, which will produce accurate aggregation results?

Nishikant_Tayade · January 19, 2022, 6:44am

As I read through document for Terms Aggregation, I came across the fact that the results from Term Aggregation are not always accurate, but we can increase the size to get the accurate results.

I know : -

How Query-Then-Fetch works.
How top terms are calculated at each shard(shard_size) and then merge at co-ordinator node(size).
What "doc_count_error_upper_bound" means, and how it can help in determining that there may be error in top results and we need to increase the size.

But is there any mathematical approach or any other way, with help of which we can determine the correct size that we should ask for once we get in-accurate results for the first time?

Tomo_M · January 19, 2022, 7:42am

Maybe no,

Suppose you have 3 primary shards and size=10, shard_size=25, and the 10th term 'A' in the result were contained only in the result of 2 primary shards.
If the count of the 25th term in the shard not returning 'A' is N, the count of 'A' in that shard should be any value between 0 and N. Then doc_count_error_upper_bound is N (or N-1, I'm not sure). doc_count_error_upper_bound is the sum of threasholds of shards not returning the bucket of that specific term.

About the rank of 'A' in such shards, there is no information contained in the response. Therefore, there is no way to calculate the necessary size or shard_size for accurate result. The only way to guarantee zero error should be use shard_size more than the terms cardinality.

Nishikant_Tayade · January 19, 2022, 8:23am

@Tomo_M Thanks for replying!!

The only way to guarantee zero error should be use shard_size more than the terms cardinality.

Here what do you mean by terms cardinality ? Is it the count of unique terms, or count of all the terms?

Tomo_M · January 19, 2022, 8:27am

I meant the count of unique terms.

system · February 16, 2022, 8:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
@uboness how to improve the accuracy of terms aggregation Elasticsearch	2	501	July 6, 2017
How can i improve accuracy of term aggregation? Kibana	4	3171	May 10, 2018
Unusual aggregations size behaviour Elasticsearch	5	800	July 20, 2017
Changing aggregation size effects doc_count Elasticsearch	3	651	November 28, 2017
Aggregation result size Elasticsearch	3	570	July 6, 2017

How to determine the correct size for terms aggregation, which will produce accurate aggregation results?

Related topics