Version: 1.5.2
Data size: 60GB
Shard number: 30 primary, 30 replica
These values are default.
gateway.recover_after_nodes
gateway.recover_after_time
gateway.expected_nodes
Description: We have 12 nodes: 3 master nodes and 9 data nodes. In order to uninstall some plugins, we have to do the cluster restart. We do the following steps:
- Uninstall the plugin and shutdown node one by one (We didn't disable shard reallocation).
As a result, all the nodes are shutdown. - Start the master nodes one by one.
- Start the data nodes one by one.
However, after about an hour, some primary and replica nodes are still unallocated.
The error logs are as followed:
So we have to reroute the unallocated shards using the following api. However, after rerouting the shards, the data in these shards are lost.
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ { "allocate" : { "index" : "t37", "shard" : $shard, "node" : "datanode15", "allow_primary" : true } } ] }'
Therefore, my questions are as followed:
- Why primary shards become unallocated after a long time?
- How to do the cluster restart correctly and safely?
- If some primary shards become unallocated unfortunately, how to reroute them without losing data?
Thank you.