I'm new to elasticsearch and I recently came across this doc_count_error_upper_bound in the results. When I tried to aggregate the data using terms based on a field with sorting based on descending count and limited the data to size 10, I had doc_count_error_upper_bound as 10 on the result. This leads me to verify the correctness of the count data in the result. When I verified it was only approximate and not accurate. I came across the elasticsearch docs where I found that ES uses shards to store the data and I understood it is based on the size that the accuracy of the results varies. I can easily increase the size given to reduce the error upper bound. My question is whether we have a way to reduce the number of shards being used and if so, is reducing the number of shards used for accuracy the right way? Also, I want to know how the performance degrades when we reduce the shards? I've attached the query and the result too.
Query:
{ "size": 0, "aggs": { "2": { "terms": { "field": "...", "order": { "_count": "desc" }, "size": 10 } } }, "_source": {}, "sort": [ ], "query": { "constant_score": { "filter": { "bool": { "must": [ ... ], "should": [], "must_not": [ ... ] } } } } }
Response:
{ "took" : 7, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 7692, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "2" : { "doc_count_error_upper_bound" : 15, "sum_other_doc_count" : 6596, "buckets" : [ ... ] } } }