Hi,
We had some server issues in one node of our elasticsearch cluster (8.12.0). While resolving the issue, another node went down and consequently some indexes went into red. After the original failure was recovered, we ended up losing one data node.
At end of recovery process we noticed 3 indexes continued to remian in yellow with one of the shards ahving unassigned replica.
The error shown in reroute API was "Failed to recover from Translog".
After that I shut down the node having the primary of the shard and ran the shard-recover tool. It reported no errors, so I truncated the translog and restarted elasticsearch, then called the reroute API to allocate primary again (with allocate_stale_primary
and accept_data_loss
set to true
). The primary got allocated but the replicas are still failing.
The explain API has this output
{
"index" : "files-2024.04.01",
"shard" : 1,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2024-04-01T12:30:30.284Z",
"failed_allocation_attempts" : 5,
"details" : "failed shard on node [_t3iPcG0TAmZ1gIe2735Uw]: shard failure, reason [failed to recover from translog], failure [files-2024.04.01/FfKD0qnEQ3W-qPBT6sWR-g][[files-2024.04.01][1]] org.elasticsearch.index.engine.EngineException: failed to recover from translog\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:598)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:591)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslog$3(InternalEngine.java:567)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:561)\n\tat org.elasticsearch.index.engine.Engine.recoverFromTranslog(Engine.java:1996)\n\tat org.elasticsearch.index.shard.IndexShard.recoverLocallyUpToGlobalCheckpoint(IndexShard.java:1808)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$4(PeerRecoveryTargetService.java:392)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.lambda$map$0(ActionListenerImplementations.java:108)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:89)\n\tat org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:182)\n\tat org.elasticsearch.index.CompositeIndexEventListener.callListeners(CompositeIndexEventListener.java:278)\n\tat org.elasticsearch.index.CompositeIndexEventListener.iterateBeforeIndexShardRecovery(CompositeIndexEventListener.java:287)\n\tat org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:314)\n\tat org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1701)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:378)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:713)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.lang.Thread.run(Thread.java:1583)\nCaused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [/data/elasticsearch/indices/FfKD0qnEQ3W-qPBT6sWR-g/1/translog/translog-906.tlog] is corrupted, translog truncated\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:114)\n\tat org.elasticsearch.index.translog.BaseTranslogReader.readSize(BaseTranslogReader.java:68)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readOperation(TranslogSnapshot.java:69)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.next(TranslogSnapshot.java:59)\n\tat org.elasticsearch.index.translog.MultiSnapshot.next(MultiSnapshot.java:60)\n\tat org.elasticsearch.index.translog.Translog$SeqNoFilterSnapshot.next(Translog.java:1042)\n\tat org.elasticsearch.index.shard.IndexShard.runTranslogRecovery(IndexShard.java:1926)\n\tat org.elasticsearch.index.shard.IndexShard.lambda$recoverLocallyUpToGlobalCheckpoint$12(IndexShard.java:1798)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:596)\n\t... 23 more\nCaused by: java.io.EOFException: read past EOF. pos [5074311] length: [4] end: [5074311]\n\tat org.elasticsearch.common.io.Channels.readFromFileChannelWithEofException(Channels.java:96)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:112)\n\t... 31 more\n",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
"node_allocation_decisions" : [
{
"node_id" : "7VyrRwY7RsaNsG9Ku_4mHA",
"node_name" : "esnode2",
"transport_address" : "10.44.0.47:9201",
"node_attributes" : {
"xpack.installed" : "true",
"transform.config_version" : "10.0.0"
},
"roles" : [
"data",
"ingest",
"remote_cluster_client",
"transform"
],
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [POST /_cluster/reroute?retry_failed&metric=none] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-04-01T12:30:30.284Z], failed_attempts[5], failed_nodes[[_t3iPcG0TAmZ1gIe2735Uw]], delayed=false, last_node[_t3iPcG0TAmZ1gIe2735Uw], details[failed shard on node [_t3iPcG0TAmZ1gIe2735Uw]: shard failure, reason [failed to recover from translog], failure [files-2024.04.01/FfKD0qnEQ3W-qPBT6sWR-g][[files-2024.04.01][1]] org.elasticsearch.index.engine.EngineException: failed to recover from translog\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:598)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:591)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslog$3(InternalEngine.java:567)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:561)\n\tat org.elasticsearch.index.engine.Engine.recoverFromTranslog(Engine.java:1996)\n\tat org.elasticsearch.index.shard.IndexShard.recoverLocallyUpToGlobalCheckpoint(IndexShard.java:1808)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$4(PeerRecoveryTargetService.java:392)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.lambda$map$0(ActionListenerImplementations.java:108)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:89)\n\tat org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:182)\n\tat org.elasticsearch.index.CompositeIndexEventListener.callListeners(CompositeIndexEventListener.java:278)\n\tat org.elasticsearch.index.CompositeIndexEventListener.iterateBeforeIndexShardRecovery(CompositeIndexEventListener.java:287)\n\tat org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:314)\n\tat org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1701)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:378)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:713)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.lang.Thread.run(Thread.java:1583)\nCaused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [/data/elasticsearch/indices/FfKD0qnEQ3W-qPBT6sWR-g/1/translog/translog-906.tlog] is corrupted, translog truncated\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:114)\n\tat org.elasticsearch.index.translog.BaseTranslogReader.readSize(BaseTranslogReader.java:68)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readOperation(TranslogSnapshot.java:69)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.next(TranslogSnapshot.java:59)\n\tat org.elasticsearch.index.translog.MultiSnapshot.next(MultiSnapshot.java:60)\n\tat org.elasticsearch.index.translog.Translog$SeqNoFilterSnapshot.next(Translog.java:1042)\n\tat org.elasticsearch.index.shard.IndexShard.runTranslogRecovery(IndexShard.java:1926)\n\tat org.elasticsearch.index.shard.IndexShard.lambda$recoverLocallyUpToGlobalCheckpoint$12(IndexShard.java:1798)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:596)\n\t... 23 more\nCaused by: java.io.EOFException: read past EOF. pos [5074311] length: [4] end: [5074311]\n\tat org.elasticsearch.common.io.Channels.readFromFileChannelWithEofException(Channels.java:96)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:112)\n\t... 31 more\n], allocation_status[no_attempt]]]"
}
]
},
{
"node_id" : "CrQHz5xASl-3aS5iqaRqew",
"node_name" : "esnode1",
"transport_address" : "10.44.0.46:9201",
"node_attributes" : {
"xpack.installed" : "true",
"transform.config_version" : "10.0.0"
},
"roles" : [
"data",
"ingest",
"remote_cluster_client",
"transform"
],
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [POST /_cluster/reroute?retry_failed&metric=none] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-04-01T12:30:30.284Z], failed_attempts[5], failed_nodes[[_t3iPcG0TAmZ1gIe2735Uw]], delayed=false, last_node[_t3iPcG0TAmZ1gIe2735Uw], details[failed shard on node [_t3iPcG0TAmZ1gIe2735Uw]: shard failure, reason [failed to recover from translog], failure [files-2024.04.01/FfKD0qnEQ3W-qPBT6sWR-g][[files-2024.04.01][1]] org.elasticsearch.index.engine.EngineException: failed to recover from translog\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:598)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:591)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslog$3(InternalEngine.java:567)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:561)\n\tat org.elasticsearch.index.engine.Engine.recoverFromTranslog(Engine.java:1996)\n\tat org.elasticsearch.index.shard.IndexShard.recoverLocallyUpToGlobalCheckpoint(IndexShard.java:1808)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$4(PeerRecoveryTargetService.java:392)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.lambda$map$0(ActionListenerImplementations.java:108)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:89)\n\tat org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:182)\n\tat org.elasticsearch.index.CompositeIndexEventListener.callListeners(CompositeIndexEventListener.java:278)\n\tat org.elasticsearch.index.CompositeIndexEventListener.iterateBeforeIndexShardRecovery(CompositeIndexEventListener.java:287)\n\tat org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:314)\n\tat org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1701)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:378)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:713)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.lang.Thread.run(Thread.java:1583)\nCaused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [/data/elasticsearch/indices/FfKD0qnEQ3W-qPBT6sWR-g/1/translog/translog-906.tlog] is corrupted, translog truncated\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:114)\n\tat org.elasticsearch.index.translog.BaseTranslogReader.readSize(BaseTranslogReader.java:68)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readOperation(TranslogSnapshot.java:69)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.next(TranslogSnapshot.java:59)\n\tat org.elasticsearch.index.translog.MultiSnapshot.next(MultiSnapshot.java:60)\n\tat org.elasticsearch.index.translog.Translog$SeqNoFilterSnapshot.next(Translog.java:1042)\n\tat org.elasticsearch.index.shard.IndexShard.runTranslogRecovery(IndexShard.java:1926)\n\tat org.elasticsearch.index.shard.IndexShard.lambda$recoverLocallyUpToGlobalCheckpoint$12(IndexShard.java:1798)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:596)\n\t... 23 more\nCaused by: java.io.EOFException: read past EOF. pos [5074311] length: [4] end: [5074311]\n\tat org.elasticsearch.common.io.Channels.readFromFileChannelWithEofException(Channels.java:96)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:112)\n\t... 31 more\n], allocation_status[no_attempt]]]"
},
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "a copy of this shard is already allocated to this node [[files-2024.04.01][1], node[CrQHz5xASl-3aS5iqaRqew], [P], s[STARTED], a[id=n6evfKELQAWLgvOuV6SySA], failed_attempts[0]]"
}
]
},
{
"node_id" : "_t3iPcG0TAmZ1gIe2735Uw",
"node_name" : "esnode3",
"transport_address" : "10.44.0.48:9201",
"node_attributes" : {
"xpack.installed" : "true",
"transform.config_version" : "10.0.0"
},
"roles" : [
"data",
"ingest",
"remote_cluster_client",
"transform"
],
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [POST /_cluster/reroute?retry_failed&metric=none] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-04-01T12:30:30.284Z], failed_attempts[5], failed_nodes[[_t3iPcG0TAmZ1gIe2735Uw]], delayed=false, last_node[_t3iPcG0TAmZ1gIe2735Uw], details[failed shard on node [_t3iPcG0TAmZ1gIe2735Uw]: shard failure, reason [failed to recover from translog], failure [files-2024.04.01/FfKD0qnEQ3W-qPBT6sWR-g][[files-2024.04.01][1]] org.elasticsearch.index.engine.EngineException: failed to recover from translog\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:598)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:591)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslog$3(InternalEngine.java:567)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:561)\n\tat org.elasticsearch.index.engine.Engine.recoverFromTranslog(Engine.java:1996)\n\tat org.elasticsearch.index.shard.IndexShard.recoverLocallyUpToGlobalCheckpoint(IndexShard.java:1808)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$4(PeerRecoveryTargetService.java:392)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.lambda$map$0(ActionListenerImplementations.java:108)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:89)\n\tat org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:182)\n\tat org.elasticsearch.index.CompositeIndexEventListener.callListeners(CompositeIndexEventListener.java:278)\n\tat org.elasticsearch.index.CompositeIndexEventListener.iterateBeforeIndexShardRecovery(CompositeIndexEventListener.java:287)\n\tat org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:314)\n\tat org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1701)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:378)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:713)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.lang.Thread.run(Thread.java:1583)\nCaused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [/data/elasticsearch/indices/FfKD0qnEQ3W-qPBT6sWR-g/1/translog/translog-906.tlog] is corrupted, translog truncated\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:114)\n\tat org.elasticsearch.index.translog.BaseTranslogReader.readSize(BaseTranslogReader.java:68)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readOperation(TranslogSnapshot.java:69)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.next(TranslogSnapshot.java:59)\n\tat org.elasticsearch.index.translog.MultiSnapshot.next(MultiSnapshot.java:60)\n\tat org.elasticsearch.index.translog.Translog$SeqNoFilterSnapshot.next(Translog.java:1042)\n\tat org.elasticsearch.index.shard.IndexShard.runTranslogRecovery(IndexShard.java:1926)\n\tat org.elasticsearch.index.shard.IndexShard.lambda$recoverLocallyUpToGlobalCheckpoint$12(IndexShard.java:1798)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:596)\n\t... 23 more\nCaused by: java.io.EOFException: read past EOF. pos [5074311] length: [4] end: [5074311]\n\tat org.elasticsearch.common.io.Channels.readFromFileChannelWithEofException(Channels.java:96)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:112)\n\t... 31 more\n], allocation_status[no_attempt]]]"
}
]
},
{
"node_id" : "gIa422qtRymC2yCgUX0BRg",
"node_name" : "esnode4",
"transport_address" : "10.44.0.49:9201",
"node_attributes" : {
"transform.config_version" : "10.0.0",
"xpack.installed" : "true"
},
"roles" : [
"data",
"ingest",
"remote_cluster_client",
"transform"
],
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [POST /_cluster/reroute?retry_failed&metric=none] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-04-01T12:30:30.284Z], failed_attempts[5], failed_nodes[[_t3iPcG0TAmZ1gIe2735Uw]], delayed=false, last_node[_t3iPcG0TAmZ1gIe2735Uw], details[failed shard on node [_t3iPcG0TAmZ1gIe2735Uw]: shard failure, reason [failed to recover from translog], failure [files-2024.04.01/FfKD0qnEQ3W-qPBT6sWR-g][[files-2024.04.01][1]] org.elasticsearch.index.engine.EngineException: failed to recover from translog\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:598)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:591)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslog$3(InternalEngine.java:567)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:561)\n\tat org.elasticsearch.index.engine.Engine.recoverFromTranslog(Engine.java:1996)\n\tat org.elasticsearch.index.shard.IndexShard.recoverLocallyUpToGlobalCheckpoint(IndexShard.java:1808)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$4(PeerRecoveryTargetService.java:392)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.lambda$map$0(ActionListenerImplementations.java:108)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:89)\n\tat org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:182)\n\tat org.elasticsearch.index.CompositeIndexEventListener.callListeners(CompositeIndexEventListener.java:278)\n\tat org.elasticsearch.index.CompositeIndexEventListener.iterateBeforeIndexShardRecovery(CompositeIndexEventListener.java:287)\n\tat org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:314)\n\tat org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1701)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:378)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:713)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.lang.Thread.run(Thread.java:1583)\nCaused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [/data/elasticsearch/indices/FfKD0qnEQ3W-qPBT6sWR-g/1/translog/translog-906.tlog] is corrupted, translog truncated\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:114)\n\tat org.elasticsearch.index.translog.BaseTranslogReader.readSize(BaseTranslogReader.java:68)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readOperation(TranslogSnapshot.java:69)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.next(TranslogSnapshot.java:59)\n\tat org.elasticsearch.index.translog.MultiSnapshot.next(MultiSnapshot.java:60)\n\tat org.elasticsearch.index.translog.Translog$SeqNoFilterSnapshot.next(Translog.java:1042)\n\tat org.elasticsearch.index.shard.IndexShard.runTranslogRecovery(IndexShard.java:1926)\n\tat org.elasticsearch.index.shard.IndexShard.lambda$recoverLocallyUpToGlobalCheckpoint$12(IndexShard.java:1798)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:596)\n\t... 23 more\nCaused by: java.io.EOFException: read past EOF. pos [5074311] length: [4] end: [5074311]\n\tat org.elasticsearch.common.io.Channels.readFromFileChannelWithEofException(Channels.java:96)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:112)\n\t... 31 more\n], allocation_status[no_attempt]]]"
}
]
}
]
}
I am unable to understand why the replica cannot be allocated? Is there something I can do to fix this?
Thanks