Inaccurate sum aggregation results

shushu · April 22, 2018, 10:45am

Hi all,
I index lots of documents in batch mode, indexing ~50M docs.
Since it took too long (200K/hour) while running on a 5 data nodes cluster, I enlarged the cluster with 2 more data nodes. I was watching while the data "spread" on the new nodes, and all shards re-allocated properly.

When validating the data, I found out that while all of the docs are in the cluster, I fail to get the proper sum results - the numbers are lower.

I am doing sum aggregation, and I filter by given terms.
I get different values between different aggregations - when I aggregate by terms and sum the value, I get 3 doc_counts for this given term. When I filter this term only, I get 4 doc_counts. It means that for some reason 1 document is kept out of the aggregation.
When I get the 3 doc_counts, I get doc_count_error_upper_bound=-1.
in the _shards I always get high number of "skipped". I think it just means those shards supposedly does not have relevant docs, but I am not 100% about that.

I am working with elasticsearch for long time, and never encountered this problem.
I am on 6.2.

Please advise,
Shushu

Mark_Harwood · April 27, 2018, 9:37am

Can you supply examples of the 2 JSON queries where you see a discrepancy between them?

system · May 25, 2018, 9:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sum_other_doc_count higher than total docs Elasticsearch	3	1693	August 16, 2018
0 hits, 14693688 total on index with 700 documents Elasticsearch	15	2257	November 7, 2017
Cluster eventually starts giving absurdly wrong counts on search Elasticsearch	4	410	November 25, 2020
About aggregation Query count Mismatch for lesser Records Elastic Search elastic-site-search	2	159	February 22, 2024
Aggregation bug? Or user error? Elasticsearch	11	656	July 6, 2017

Inaccurate sum aggregation results

Related topics