Shards stuck in Initializing mode


(Jerome David) #1

We've had a crash and had to restart our server.
Our cluster has 5 nodes and we use 1 replica.
Since then, every time we try to open an index, the replica for the shard 3 (always 3!) gets stuck in initializing mode.
It even gets stuck when we create a new index (with zero document).
After that, ES tries to move it from one node to the other but it never succeeds.

We managed to fix it in a few cases by doing the following:

  1. Close the index
  2. ssh to the node holding the replica for shard 3 (the one that is stuck) and remove it from disk.
  3. copy the primary for shard 3 from one of the other nodes to the node currently holding the replica for shard 3.

This got us back to a working index, but every time a shard gets relocated, we get the same problem. We also can create any new index without going through all these steps.

We are using ES v1.4.2.

Is this a known problem?
Would upgrading to 1.7.2 be any help?


(Mark Walkom) #2

Why not just drop the replica and add it back?

You could increase logging to see what is happening, something like;

PUT /_cluster/settings
{
  "transient": {
    "logger._root": "DEBUG"
  }
}

(Jerome David) #3

Yes, we tried setting the number of replicas to zero and then setting it
back to 1 or 2. But every time, the node 3 of the last replica gets stuck.
We've set up a new cluster with 1.7.2. So far it seems to behave better,
but we still need to do more testing.


(system) #4