Using multiple shards causes incorrect results to be generated

pranav0091 · October 4, 2017, 7:43am

Hi Christian,

I see this issue pop up in as few as 4 shards (4 shards because I have 4 core machine).
While I appreciate that it has the potential to overwhelm large indices, the current thresholds arent always working as seen here. As a user, it is not acceptable when something fails and doesnt tell me anything.

Right now as a user I have:

No warning that the returned values can be wrong
- ES would know that it found non-identical buckets across the N shards
- But it does not tell the user that the values may be unreliable
No ability to set a larger bucket count

As a user, silently misleading/bad data is more dangerous than no data. What is the utility/point if I am told convincingly that the minimum of a certain field across N buckets is 0.24 when there it can be 0.24 or 1000 (basically just about any number)? What did I learn from this DB-query that I couldn't have just guessed? Nothing, But in the worst case, I might now make decisions based on the observed value of 0.24 which is wrong - and I would have no idea until its probably too late that I am basing my decisions on bad data. To me its an annoyance (my data is not too large), but for someone else it can be catastrophically expensive.

What about:

A means to make this bucket count settable by the user on a per-index granularity (and the default value to be whatever it is now) ?

And a warning when ES detects this case?

Topic		Replies	Views
Control number of buckets created in an aggregation Elasticsearch	7	6624	September 5, 2019
Aggregation query Elasticsearch	2	341	July 6, 2017
Bug in aggregation result when using shards Elasticsearch	2	1083	February 11, 2019
[BUG?] Wrong aggregated values shown in visualization Kibana	11	2840	October 24, 2017
Inconsistent aggregation results Elasticsearch	1	454	December 5, 2019

Using multiple shards causes incorrect results to be generated

Related topics