Hi all,
I index lots of documents in batch mode, indexing ~50M docs.
Since it took too long (200K/hour) while running on a 5 data nodes cluster, I enlarged the cluster with 2 more data nodes. I was watching while the data "spread" on the new nodes, and all shards re-allocated properly.
When validating the data, I found out that while all of the docs are in the cluster, I fail to get the proper sum results - the numbers are lower.
- I am doing sum aggregation, and I filter by given terms.
- I get different values between different aggregations - when I aggregate by terms and sum the value, I get 3 doc_counts for this given term. When I filter this term only, I get 4 doc_counts. It means that for some reason 1 document is kept out of the aggregation.
- When I get the 3 doc_counts, I get doc_count_error_upper_bound=-1.
- in the _shards I always get high number of "skipped". I think it just means those shards supposedly does not have relevant docs, but I am not 100% about that.
I am working with elasticsearch for long time, and never encountered this problem.
I am on 6.2.
Please advise,
Shushu