Hello all of you bright people,
We’re currently running a smallish 300 GB cluster in production on 5 nodes
with around 30 mil docs. Everything works flawlessly except when a node
really goes down (I mean like network/ HW failure/ kill -9).
When we lose a node the cluster becomes more or less completely
unresponsive for a few minutes. Both regarding indexing and querying. This
is of course, less than ideal as we have load 24/7.
I would really appreciate some help with understanding best practice
settings to have a robust cluster.
First goal for us is for the cluster to not become unresponsive in the
event of a node crash. After reading everything I could find on the web I
can't really understand if ES is designed to be unresponsive for
ping_retries*ping_timeout seconds or if the cluster will continue to server
query requests even during this time. Could anyone help me shed light on
Secondly in the event of a even worse failure where the cluster goes into
red state, would it be possible to allow the cluster to still serve
I would be ever so grateful for anyone willing to help me understand how
this works or what we would need to change to make our ES installation more
I’ve included our config here:
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bb1d307b-8c00-469d-81fb-8067942d02ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.