Howdy,
I have a cluster in Elasticsearch version 1.4.4 that is currently under-provisioned. Currently, when a node falls off the cluster the other nodes start reassigning shards from the failed nodes to the other nodes. This jacks up the heap/disk space used on the other nodes. I have the gateway.recover_after_nodes set so I thought that it would wait until the node had rejoined and then reinitialize the shards to the node that had failed, but that doesn't seem to be the case (i.e. when you disable shard allocation and restart a node).
Is there a setting that would wait and then reinitialize to the node that had fallen off the cluster and then rejoined?
I have these set:
gateway.recover_after_nodes: 8
gateway.expected_nodes: 8
I am mostly just curious if I'm using these settings wrong or there is a bug in this version. I am working on upgrading the cluster to version 5.x but it won't be immediate so this would be triage in the meantime.
Thanks,
Walt