Hi,
Unfortunately, there was a problem on my cluster when the power was switched off.
Cluster has RED status - a few shards do not want to get up.
I've tried using reroute:
POST /_cluster/reroute?retry_failed=true
but it's not working.
If I make an query:
GET /_cluster/allocation/explain
{
"index": "my-index-xxx",
"shard": 0,
"primary": true
}
The cluster will return:
{
"index" : "my-index-xxx",
"shard" : 0,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2020-05-09T10:19:09.285Z",
"failed_allocation_attempts" : 5,
"details" : "failed shard on node [4oMUIeiASHeq8ZYNIx5hUg]: shard failure, reason [failed to recover from translog], failure EngineException[failed to recover from translog]; nested: TranslogCorruptedException[translog from source [/var/lib/elasticsearch/nodes/0/indices/NiCEzOZjS9ib2oG2AC3QXg/0/translog/translog-134.tlog] is corrupted, translog truncated]; nested: EOFException[read past EOF. pos [1718122] length: [4] end: [1718122]]; ",
"last_allocation_status" : "no"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy",
"node_allocation_decisions" : [
{
"node_id" : "4oMUIeiASHeq8ZYNIx5hUg",
"node_name" : "data-006",
"transport_address" : "192.168.88.40:9300",
"node_attributes" : {
"machine_id" : "M001",
"xpack.installed" : "true"
},
"node_decision" : "no",
"store" : {
"in_sync" : true,
"allocation_id" : "f7brAMpUSgqLn1ERtP5teg"
},
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-05-09T10:19:09.285Z], failed_attempts[5], failed_nodes[[4oMUIeiASHeq8ZYNIx5hUg]], delayed=false, details[failed shard on node [4oMUIeiASHeq8ZYNIx5hUg]: shard failure, reason [failed to recover from translog], failure EngineException[failed to recover from translog]; nested: TranslogCorruptedException[translog from source [/var/lib/elasticsearch/nodes/0/indices/NiCEzOZjS9ib2oG2AC3QXg/0/translog/translog-134.tlog] is corrupted, translog truncated]; nested: EOFException[read past EOF. pos [1718122] length: [4] end: [1718122]]; ], allocation_status[deciders_no]]]"
}
]
},
...
Any ideas on how to recover the shard?