Multiple Shards Stuck in INIT State

I have been battling with one of my clusters for a couple of days now. I had multiple shards stuck in INIT for several hours, and if I'd let it it would have been days. ES saw these shards as being in progress, but they were stuck at 0%. Eventually this ground the cluster to a halt as the maximum concurrent shard for re-balancing and other related limits were hit. This morning I have managed to figure out what caused it and I thought I would share it with you in case someone finds it useful in future.
On my servers, there was a stale NFS share, which was actually nothing to do with the data paths for Elasticsearch, and wasn't being used or defined anywhere in the elasticsearch.yml file. However, the source of the NFS share was taken down a couple of days ago, and that was when my troubles started. So although Elasticsearch didn't use those file shares, it seemed to have an impact on the shard allocation algorithm somehow. Now with the stale NFS mount points removed from the system, ES seems able to write data to those nodes again.
It's a weird one.. and I still don't know how/why it had the impact it did, but its something to look out for in future if anyone else stumbles across this post after suffering similar issues.

What version are you on? How many indices, shards?

This cluster currently running 5.5.2 .

363 indices, 5512 shards.

How many nodes?

24 data nodes. 20 hot, 4 warm.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.