Replica shards stuck in Initialization phase

Hi all,

I just did a full cluster restart, and when the nodes came up, none of the replica shards were getting initialized.
I could make the cluster go green, by setting all the nodes to 0 replicas, but when changed to 1 or 2 replicas, the replicas are stuck in initializing phase (yellow cluster).

  • I have tried restarting cluster twice already
  • There are no .recovery files in the translog
  • I can continue using cluster with no replicas

Using ES 1.7.1 and JVM 1.8.0_92

Can anybody help with this issue?

EDIT: finding this in the logs:

[WARN]  (elasticsearch[][generic][T#115]) org.elasticsearch.indices.cluster: [] [[index-2016.04][1]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [index-2016.04][1]: Recovery failed from [][][][inet[/]]{max_local_storage_nodes=1, AZ=us-west-2a, master=false} into [][][][inet[/]]{max_local_storage_nodes=1, AZ=us-west-2c, master=false} (no activity after [30m])
        at org.elasticsearch.indices.recovery.RecoveriesCollection$RecoveryMonitor.doRun(RecoveriesCollection.java:235)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchTimeoutException: no activity after [30m]
        ... 5 more

Thanks,

V

Fixed by following: http://stackoverflow.com/a/33026075

That stackoverflow post is totally wrong, KB will not stop any sort of shard action from happening.

And the solution is a really bad idea. Really bad.
Deleting the segments gen files basically stops Lucene/ES from keeping track of deleted files.

KB? Kibana? I know our issue was not created by Kibana, but it seems to have gone away when I removed the segments gen files. and restarted the cluster.
But if that wasnt the issue, @warkolm could you help diagnose the issue? What could be the reason for the replicas from not being initialized?

thanks!

You'd need to turn the logging level up when it happens again to get more info.