Corrupt primary shard, how to recover from replica shard?

I have an index with 3 primary shards and 1 replica. Primary shard 1 for some reason goes corrupt. The error mentions merge failed, org.apache.lucene.index.CorruptIndexException which I'm looking into. Am I right thinking that Elasticsearch should be able to recover from its replica shard in this case, assuming it's not corrupt itself? Or is there a way to promote the replica shard to be the primary?

My understanding of this process is that the replica shard should be automatically promoted to the primary shard if the primary shard is not able to allocate. In my situation, the replica is not being promoted . As a test I ran a _cluster/reroute:

POST /_cluster/reroute
{
  "commands": [
    {
      "cancel": {
        "index": "my-index-000001",
        "shard": 2,
        "node": "my-elastic-node",
        "allow_primary": true
      }
    }
  ]
}

I ended up deleting the index because the _cluster/reroute didn't work, but would it have been better to use allocate_empty_primary, and would the replica be promoted?

Also, I didn't think about using allocate_stale_primary because that's if a node left the cluster and a shard is stale because of it, or am I not understanding that correctly?

Sadly, I don't have snapshots :frowning_face:

That's correct. If it was possible to promote the replica, Elasticsearch would do so without needing your involvement.

The allocation explain API is the best way to answer your remaining questions. It should at least give a hint about why the replica couldn't be promoted. (It's better in 8.x, but still pretty useful in 7.x).

1 Like

I most likely looked over it too quickly. If it happens again, I will go through it more thoroughly. Thank you!!

1 Like