I have an ElasticSearch 2.4 cluster of three nodes. One of the nodes crashed yesterday, but I don’t get the data recovered on the failed node (which is online again and added to the cluster). On the two nodes that didn’t crash garbage collection info messages appear in the syslog and the CPU is continuous 100% while there are no processes that use ES right now. Any idea how to forge the sharing of shards between the nodes?
Fixed it myself.. the nonreplication was caused by an index that appeared twice on two nodes with both the same shard number. After removing the index the replication started.
A nice way of getting this information and to forge relocation of a shard, you need first get the node names, doing something like this:
curl --silent -XGET 'http://172.31.24.32:9200/_nodes/stats?pretty=true' | grep -B10 172.31.24. | egrep "(ip|\:\ \{)"
NODE="thenodename" curl --silent -XGET http://172.31.24.32:9200/_cat/shards \ | grep UNASSIGNED \ | awk '{print $1, $2}' \ | while read index shard; do curl --silent -XPOST '172.31.24.32:9200/_cluster/reroute' -d "{ \"commands\" : [ { \"allocate\" : { \"index\" : \"${index}\", \"shard\" : ${shard}, \"node\" : \"${NODE}\", \"allow_primary\" : true } } ] }" 2>&1>/dev/null; echo ; curl --silent 'http://172.31.24.32:9200/_cluster/health?pretty=true' \ | egrep '("unassigned_shards|status)' done
Thanks to Antonino Abbate (@ninoabbate)
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.