Hi Elastic Team,
There's a single unassigned shard in the cluster due to ALLOCATION_FAILED error which I can't figure out how to resolve.
I've already tried retrying several times using curl -XPOST 'localhost:9200/_cluster/reroute?retry_failed=true&pretty'
and also bounced the node with the problematic shard shown in the output of GET /_cluster/allocation/explain?pretty
:
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2018-07-30T11:28:21.654Z",
"failed_allocation_attempts" : 5,
"details" : "failed shard on node [_wtZ2Gq7TvWoKsII2yCN6Q]: shard failure, reason [failed to recover from translog], failure EngineException[failed to recover from translog]; nested: TranslogCorruptedException[operation size must be at least 4 but was: 0]; ",
"last_allocation_status" : "no"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy",
...
other output
...
{
"node_id" : "_wtZ2Gq7TvWoKsII2yCN6Q",
"node_name" : "the_problematic_node",
"transport_address" : "171.134.100.215:9301",
"node_attributes" : {
"ml.machine_memory" : "270831382528",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true",
"node.type" : "hot"
},
"node_decision" : "no",
"store" : {
"in_sync" : true,
"allocation_id" : "2ujCGOK-Swesr0sigjdeGw"
},
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2018-07-30T11:28:21.654Z], failed_attempts[5], delayed=false, details[failed shard on node [_wtZ2Gq7TvWoKsII2yCN6Q]: shard failure, reason [failed to recover from translog], failure EngineException[failed to recover from translog]; nested: TranslogCorruptedException[operation size must be at least 4 but was: 0]; ], allocation_status[deciders_no]]]"
}
]
},
Is there a way to recover this shard?
Many thanks,