Hi Christian,
I see this issue pop up in as few as 4 shards (4 shards because I have 4 core machine).
While I appreciate that it has the potential to overwhelm large indices, the current thresholds arent always working as seen here. As a user, it is not acceptable when something fails and doesnt tell me anything.
Right now as a user I have:
- No warning that the returned values can be wrong
- ES would know that it found non-identical buckets across the N shards
- But it does not tell the user that the values may be unreliable
- No ability to set a larger bucket count
As a user, silently misleading/bad data is more dangerous than no data. What is the utility/point if I am told convincingly that the minimum of a certain field across N buckets is 0.24 when there it can be 0.24 or 1000 (basically just about any number)? What did I learn from this DB-query that I couldn't have just guessed? Nothing, But in the worst case, I might now make decisions based on the observed value of 0.24 which is wrong - and I would have no idea until its probably too late that I am basing my decisions on bad data. To me its an annoyance (my data is not too large), but for someone else it can be catastrophically expensive.
What about:
A means to make this bucket count settable by the user on a per-index granularity (and the default value to be whatever it is now) ?
And a warning when ES detects this case?