We have a situation on our Elasticsearch cluster where a single request can quickly bring down all cluster nodes (with an OOM exception).
We updated the cluster configuration in order to have following circuits breakers in place:
We also tried to set a size limit for the fielddata cache (40%) but still getting the OOM exception on this request.
Some tips about our cluster topology:
- 5 data nodes
- max heap : 8gb
- 286 indices
- 3072 shards
- 2,857,660,643 docs
elasticsearch node log with exception :
detailed query :
We would like to know how to make sure this kind of requests will not be able to crash our entire cluster and how to go further in the root cause analysis.
Many thanks for your help!