Using multiple shards causes incorrect results to be generated

Hi Christian,

I see this issue pop up in as few as 4 shards (4 shards because I have 4 core machine).
While I appreciate that it has the potential to overwhelm large indices, the current thresholds arent always working as seen here. As a user, it is not acceptable when something fails and doesnt tell me anything.

Right now as a user I have:

  • No warning that the returned values can be wrong
    • ES would know that it found non-identical buckets across the N shards
    • But it does not tell the user that the values may be unreliable
  • No ability to set a larger bucket count

As a user, silently misleading/bad data is more dangerous than no data. What is the utility/point if I am told convincingly that the minimum of a certain field across N buckets is 0.24 when there it can be 0.24 or 1000 (basically just about any number)? What did I learn from this DB-query that I couldn't have just guessed? Nothing, But in the worst case, I might now make decisions based on the observed value of 0.24 which is wrong - and I would have no idea until its probably too late that I am basing my decisions on bad data. To me its an annoyance (my data is not too large), but for someone else it can be catastrophically expensive.

What about:

A means to make this bucket count settable by the user on a per-index granularity (and the default value to be whatever it is now) ?

And a warning when ES detects this case?