Index stuck in yellow unable to assign replica due to translog corruption

Hi,

We had some server issues in one node of our elasticsearch cluster (8.12.0). While resolving the issue, another node went down and consequently some indexes went into red. After the original failure was recovered, we ended up losing one data node.

At end of recovery process we noticed 3 indexes continued to remian in yellow with one of the shards ahving unassigned replica.

The error shown in reroute API was "Failed to recover from Translog".

After that I shut down the node having the primary of the shard and ran the shard-recover tool. It reported no errors, so I truncated the translog and restarted elasticsearch, then called the reroute API to allocate primary again (with allocate_stale_primary and accept_data_loss set to true). The primary got allocated but the replicas are still failing.

The explain API has this output

{
  "index" : "files-2024.04.01",
  "shard" : 1,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2024-04-01T12:30:30.284Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed shard on node [_t3iPcG0TAmZ1gIe2735Uw]: shard failure, reason [failed to recover from translog], failure [files-2024.04.01/FfKD0qnEQ3W-qPBT6sWR-g][[files-2024.04.01][1]] org.elasticsearch.index.engine.EngineException: failed to recover from translog\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:598)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:591)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslog$3(InternalEngine.java:567)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:561)\n\tat org.elasticsearch.index.engine.Engine.recoverFromTranslog(Engine.java:1996)\n\tat org.elasticsearch.index.shard.IndexShard.recoverLocallyUpToGlobalCheckpoint(IndexShard.java:1808)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$4(PeerRecoveryTargetService.java:392)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.lambda$map$0(ActionListenerImplementations.java:108)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:89)\n\tat org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:182)\n\tat org.elasticsearch.index.CompositeIndexEventListener.callListeners(CompositeIndexEventListener.java:278)\n\tat org.elasticsearch.index.CompositeIndexEventListener.iterateBeforeIndexShardRecovery(CompositeIndexEventListener.java:287)\n\tat org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:314)\n\tat org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1701)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:378)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:713)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.lang.Thread.run(Thread.java:1583)\nCaused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [/data/elasticsearch/indices/FfKD0qnEQ3W-qPBT6sWR-g/1/translog/translog-906.tlog] is corrupted, translog truncated\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:114)\n\tat org.elasticsearch.index.translog.BaseTranslogReader.readSize(BaseTranslogReader.java:68)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readOperation(TranslogSnapshot.java:69)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.next(TranslogSnapshot.java:59)\n\tat org.elasticsearch.index.translog.MultiSnapshot.next(MultiSnapshot.java:60)\n\tat org.elasticsearch.index.translog.Translog$SeqNoFilterSnapshot.next(Translog.java:1042)\n\tat org.elasticsearch.index.shard.IndexShard.runTranslogRecovery(IndexShard.java:1926)\n\tat org.elasticsearch.index.shard.IndexShard.lambda$recoverLocallyUpToGlobalCheckpoint$12(IndexShard.java:1798)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:596)\n\t... 23 more\nCaused by: java.io.EOFException: read past EOF. pos [5074311] length: [4] end: [5074311]\n\tat org.elasticsearch.common.io.Channels.readFromFileChannelWithEofException(Channels.java:96)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:112)\n\t... 31 more\n",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
  "node_allocation_decisions" : [
    {
      "node_id" : "7VyrRwY7RsaNsG9Ku_4mHA",
      "node_name" : "esnode2",
      "transport_address" : "10.44.0.47:9201",
      "node_attributes" : {
        "xpack.installed" : "true",
        "transform.config_version" : "10.0.0"
      },
      "roles" : [
        "data",
        "ingest",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [POST /_cluster/reroute?retry_failed&metric=none] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-04-01T12:30:30.284Z], failed_attempts[5], failed_nodes[[_t3iPcG0TAmZ1gIe2735Uw]], delayed=false, last_node[_t3iPcG0TAmZ1gIe2735Uw], details[failed shard on node [_t3iPcG0TAmZ1gIe2735Uw]: shard failure, reason [failed to recover from translog], failure [files-2024.04.01/FfKD0qnEQ3W-qPBT6sWR-g][[files-2024.04.01][1]] org.elasticsearch.index.engine.EngineException: failed to recover from translog\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:598)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:591)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslog$3(InternalEngine.java:567)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:561)\n\tat org.elasticsearch.index.engine.Engine.recoverFromTranslog(Engine.java:1996)\n\tat org.elasticsearch.index.shard.IndexShard.recoverLocallyUpToGlobalCheckpoint(IndexShard.java:1808)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$4(PeerRecoveryTargetService.java:392)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.lambda$map$0(ActionListenerImplementations.java:108)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:89)\n\tat org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:182)\n\tat org.elasticsearch.index.CompositeIndexEventListener.callListeners(CompositeIndexEventListener.java:278)\n\tat org.elasticsearch.index.CompositeIndexEventListener.iterateBeforeIndexShardRecovery(CompositeIndexEventListener.java:287)\n\tat org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:314)\n\tat org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1701)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:378)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:713)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.lang.Thread.run(Thread.java:1583)\nCaused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [/data/elasticsearch/indices/FfKD0qnEQ3W-qPBT6sWR-g/1/translog/translog-906.tlog] is corrupted, translog truncated\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:114)\n\tat org.elasticsearch.index.translog.BaseTranslogReader.readSize(BaseTranslogReader.java:68)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readOperation(TranslogSnapshot.java:69)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.next(TranslogSnapshot.java:59)\n\tat org.elasticsearch.index.translog.MultiSnapshot.next(MultiSnapshot.java:60)\n\tat org.elasticsearch.index.translog.Translog$SeqNoFilterSnapshot.next(Translog.java:1042)\n\tat org.elasticsearch.index.shard.IndexShard.runTranslogRecovery(IndexShard.java:1926)\n\tat org.elasticsearch.index.shard.IndexShard.lambda$recoverLocallyUpToGlobalCheckpoint$12(IndexShard.java:1798)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:596)\n\t... 23 more\nCaused by: java.io.EOFException: read past EOF. pos [5074311] length: [4] end: [5074311]\n\tat org.elasticsearch.common.io.Channels.readFromFileChannelWithEofException(Channels.java:96)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:112)\n\t... 31 more\n], allocation_status[no_attempt]]]"
        }
      ]
    },
    {
      "node_id" : "CrQHz5xASl-3aS5iqaRqew",
      "node_name" : "esnode1",
      "transport_address" : "10.44.0.46:9201",
      "node_attributes" : {
        "xpack.installed" : "true",
        "transform.config_version" : "10.0.0"
      },
      "roles" : [
        "data",
        "ingest",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [POST /_cluster/reroute?retry_failed&metric=none] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-04-01T12:30:30.284Z], failed_attempts[5], failed_nodes[[_t3iPcG0TAmZ1gIe2735Uw]], delayed=false, last_node[_t3iPcG0TAmZ1gIe2735Uw], details[failed shard on node [_t3iPcG0TAmZ1gIe2735Uw]: shard failure, reason [failed to recover from translog], failure [files-2024.04.01/FfKD0qnEQ3W-qPBT6sWR-g][[files-2024.04.01][1]] org.elasticsearch.index.engine.EngineException: failed to recover from translog\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:598)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:591)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslog$3(InternalEngine.java:567)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:561)\n\tat org.elasticsearch.index.engine.Engine.recoverFromTranslog(Engine.java:1996)\n\tat org.elasticsearch.index.shard.IndexShard.recoverLocallyUpToGlobalCheckpoint(IndexShard.java:1808)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$4(PeerRecoveryTargetService.java:392)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.lambda$map$0(ActionListenerImplementations.java:108)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:89)\n\tat org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:182)\n\tat org.elasticsearch.index.CompositeIndexEventListener.callListeners(CompositeIndexEventListener.java:278)\n\tat org.elasticsearch.index.CompositeIndexEventListener.iterateBeforeIndexShardRecovery(CompositeIndexEventListener.java:287)\n\tat org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:314)\n\tat org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1701)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:378)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:713)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.lang.Thread.run(Thread.java:1583)\nCaused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [/data/elasticsearch/indices/FfKD0qnEQ3W-qPBT6sWR-g/1/translog/translog-906.tlog] is corrupted, translog truncated\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:114)\n\tat org.elasticsearch.index.translog.BaseTranslogReader.readSize(BaseTranslogReader.java:68)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readOperation(TranslogSnapshot.java:69)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.next(TranslogSnapshot.java:59)\n\tat org.elasticsearch.index.translog.MultiSnapshot.next(MultiSnapshot.java:60)\n\tat org.elasticsearch.index.translog.Translog$SeqNoFilterSnapshot.next(Translog.java:1042)\n\tat org.elasticsearch.index.shard.IndexShard.runTranslogRecovery(IndexShard.java:1926)\n\tat org.elasticsearch.index.shard.IndexShard.lambda$recoverLocallyUpToGlobalCheckpoint$12(IndexShard.java:1798)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:596)\n\t... 23 more\nCaused by: java.io.EOFException: read past EOF. pos [5074311] length: [4] end: [5074311]\n\tat org.elasticsearch.common.io.Channels.readFromFileChannelWithEofException(Channels.java:96)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:112)\n\t... 31 more\n], allocation_status[no_attempt]]]"
        },
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[files-2024.04.01][1], node[CrQHz5xASl-3aS5iqaRqew], [P], s[STARTED], a[id=n6evfKELQAWLgvOuV6SySA], failed_attempts[0]]"
        }
      ]
    },
    {
      "node_id" : "_t3iPcG0TAmZ1gIe2735Uw",
      "node_name" : "esnode3",
      "transport_address" : "10.44.0.48:9201",
      "node_attributes" : {
        "xpack.installed" : "true",
        "transform.config_version" : "10.0.0"
      },
      "roles" : [
        "data",
        "ingest",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [POST /_cluster/reroute?retry_failed&metric=none] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-04-01T12:30:30.284Z], failed_attempts[5], failed_nodes[[_t3iPcG0TAmZ1gIe2735Uw]], delayed=false, last_node[_t3iPcG0TAmZ1gIe2735Uw], details[failed shard on node [_t3iPcG0TAmZ1gIe2735Uw]: shard failure, reason [failed to recover from translog], failure [files-2024.04.01/FfKD0qnEQ3W-qPBT6sWR-g][[files-2024.04.01][1]] org.elasticsearch.index.engine.EngineException: failed to recover from translog\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:598)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:591)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslog$3(InternalEngine.java:567)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:561)\n\tat org.elasticsearch.index.engine.Engine.recoverFromTranslog(Engine.java:1996)\n\tat org.elasticsearch.index.shard.IndexShard.recoverLocallyUpToGlobalCheckpoint(IndexShard.java:1808)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$4(PeerRecoveryTargetService.java:392)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.lambda$map$0(ActionListenerImplementations.java:108)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:89)\n\tat org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:182)\n\tat org.elasticsearch.index.CompositeIndexEventListener.callListeners(CompositeIndexEventListener.java:278)\n\tat org.elasticsearch.index.CompositeIndexEventListener.iterateBeforeIndexShardRecovery(CompositeIndexEventListener.java:287)\n\tat org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:314)\n\tat org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1701)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:378)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:713)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.lang.Thread.run(Thread.java:1583)\nCaused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [/data/elasticsearch/indices/FfKD0qnEQ3W-qPBT6sWR-g/1/translog/translog-906.tlog] is corrupted, translog truncated\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:114)\n\tat org.elasticsearch.index.translog.BaseTranslogReader.readSize(BaseTranslogReader.java:68)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readOperation(TranslogSnapshot.java:69)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.next(TranslogSnapshot.java:59)\n\tat org.elasticsearch.index.translog.MultiSnapshot.next(MultiSnapshot.java:60)\n\tat org.elasticsearch.index.translog.Translog$SeqNoFilterSnapshot.next(Translog.java:1042)\n\tat org.elasticsearch.index.shard.IndexShard.runTranslogRecovery(IndexShard.java:1926)\n\tat org.elasticsearch.index.shard.IndexShard.lambda$recoverLocallyUpToGlobalCheckpoint$12(IndexShard.java:1798)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:596)\n\t... 23 more\nCaused by: java.io.EOFException: read past EOF. pos [5074311] length: [4] end: [5074311]\n\tat org.elasticsearch.common.io.Channels.readFromFileChannelWithEofException(Channels.java:96)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:112)\n\t... 31 more\n], allocation_status[no_attempt]]]"
        }
      ]
    },
    {
      "node_id" : "gIa422qtRymC2yCgUX0BRg",
      "node_name" : "esnode4",
      "transport_address" : "10.44.0.49:9201",
      "node_attributes" : {
        "transform.config_version" : "10.0.0",
        "xpack.installed" : "true"
      },
      "roles" : [
        "data",
        "ingest",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [POST /_cluster/reroute?retry_failed&metric=none] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-04-01T12:30:30.284Z], failed_attempts[5], failed_nodes[[_t3iPcG0TAmZ1gIe2735Uw]], delayed=false, last_node[_t3iPcG0TAmZ1gIe2735Uw], details[failed shard on node [_t3iPcG0TAmZ1gIe2735Uw]: shard failure, reason [failed to recover from translog], failure [files-2024.04.01/FfKD0qnEQ3W-qPBT6sWR-g][[files-2024.04.01][1]] org.elasticsearch.index.engine.EngineException: failed to recover from translog\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:598)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:591)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslog$3(InternalEngine.java:567)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:561)\n\tat org.elasticsearch.index.engine.Engine.recoverFromTranslog(Engine.java:1996)\n\tat org.elasticsearch.index.shard.IndexShard.recoverLocallyUpToGlobalCheckpoint(IndexShard.java:1808)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$4(PeerRecoveryTargetService.java:392)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.lambda$map$0(ActionListenerImplementations.java:108)\n\tat org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:89)\n\tat org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:182)\n\tat org.elasticsearch.index.CompositeIndexEventListener.callListeners(CompositeIndexEventListener.java:278)\n\tat org.elasticsearch.index.CompositeIndexEventListener.iterateBeforeIndexShardRecovery(CompositeIndexEventListener.java:287)\n\tat org.elasticsearch.index.CompositeIndexEventListener.beforeIndexShardRecovery(CompositeIndexEventListener.java:314)\n\tat org.elasticsearch.index.shard.IndexShard.preRecovery(IndexShard.java:1701)\n\tat org.elasticsearch.action.ActionListener.run(ActionListener.java:386)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:378)\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:713)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.lang.Thread.run(Thread.java:1583)\nCaused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [/data/elasticsearch/indices/FfKD0qnEQ3W-qPBT6sWR-g/1/translog/translog-906.tlog] is corrupted, translog truncated\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:114)\n\tat org.elasticsearch.index.translog.BaseTranslogReader.readSize(BaseTranslogReader.java:68)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readOperation(TranslogSnapshot.java:69)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.next(TranslogSnapshot.java:59)\n\tat org.elasticsearch.index.translog.MultiSnapshot.next(MultiSnapshot.java:60)\n\tat org.elasticsearch.index.translog.Translog$SeqNoFilterSnapshot.next(Translog.java:1042)\n\tat org.elasticsearch.index.shard.IndexShard.runTranslogRecovery(IndexShard.java:1926)\n\tat org.elasticsearch.index.shard.IndexShard.lambda$recoverLocallyUpToGlobalCheckpoint$12(IndexShard.java:1798)\n\tat org.elasticsearch.index.engine.InternalEngine.lambda$recoverFromTranslogInternal$6(InternalEngine.java:596)\n\t... 23 more\nCaused by: java.io.EOFException: read past EOF. pos [5074311] length: [4] end: [5074311]\n\tat org.elasticsearch.common.io.Channels.readFromFileChannelWithEofException(Channels.java:96)\n\tat org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:112)\n\t... 31 more\n], allocation_status[no_attempt]]]"
        }
      ]
    }
  ]
}

I am unable to understand why the replica cannot be allocated? Is there something I can do to fix this?

Thanks

I think both of these questions are answered in the allocation explain output:

Thanks for looking at this @DavidTurner

I understand that replica is not allocated because translog appears corrupted.

I have tried the reroute API with retry_failed and it fails again.

I have also tried to repair the shard corruption by taking the node that has the primary offline, trying the shard repiar tool which reported "no corruption", then manually truncating the translog using that tool. After starting the service I allocated primary by accepting data loss in stale_primary command of reroute API.

Despite all this the replicas do not get allocated, no matter how many times I call the reroute API with retry_failed.

My question is why does elasticsearch not use the "good" primary to generate the replica?

Is it possible that during earlier retries all nodes (there are only 3 others) got a "bad" replica and so now is unable to regenrate replicas unless those bad replicas are manually removed first? How do I manually delete the bad replicas from each node so that reroute can start afresh?

I see, hmm. I think you might have to set number_of_replicas: 0 to clear up the corrupted replicas and then set it back to the correct number.

Also note that translog corruption indicates that your storage is buggy or otherwise not safe to use - one common cause is that your storage is incorrectly reordering write() and fsync() calls. I strongly recommend fixing this ASAP.

I think there's a bug here, we should be able to repair a corrupted replica without manual intervention as long as the primary is healthy. I opened #106961.

1 Like

Thanks @DavidTurner ! I had forgotten about the replica settings!

This morning I retried the routing and one of the index managed to get the replica built successfully without changing the replica settings.

For the other index I had to change replica setting to 0 and then 1 to get it to work

There was a third index (.monitoring-logstash-2024.04.01) for which replica settings apparently did not have any effect. I closed the index and then its replica could be built. After re-opening, it is working fine now.