I have a new 6-node ES 0.20.1 cluster. The cluster was up and running with
a small amount of data. There were a total of 302 shards (151 primary,
with 1 copy) distributed amongst 4 indices.
We shut down the cluster so that we could set the ES_HEAP_SIZE. After that
we started the cluster. Ever since then it has been in a red state with 30
unassigned shards. We tried tweaking the gateway settings (
restarting but that didn't help. It has been a full day now with no change
My thoughts are to do the following:
- Grep the logs on all of the machines to collect all of the IndexShardMissingException
- Shut down the cluster.
- See if I can find good copies of the missing shards and, if so, copy
those shard files to the appropriate index directory where the log error
messages reported a problem.
- Start the cluster and see if it can recover.
Is this a viable plan? Fortunately this is a test cluster so it is not the
end of the world if I have to wipe it and start over. But I want to
understand proper error recovery so I know what to do if this happens in
Also, are there any additional procedural steps I should follow in the
event that I have to restart a cluster so as to avoid this type of issue in
Many thanks for any information!