I have a 5 node cluster running ES 2.4.0. supporting both development and staging environments. It is common for these nodes to be restarted, especially due to developmental changes etc.. I have one client node, two data-only nodes and two master-eligible nodes.
My problem is that when I restart even the ES service on a node, my cluster immediately freaks out and reallocates all the shards that were assigned to that node, even if the service was away for less than a minute. It then takes at least ten minutes for all the shards to recover and my cluster to return to a 'Green' status.
I've been looking at the documentation and thought I saw a delay you could put into ES but can't find it anymore. Does anyone know what modification I need to make in order for my ES cluster to not reallocate shards from a node unless the node has been down for (e.g. 5 minutes)?