I have a 5.6.4 cluster that I previously had issues starting up, due possibly to a fusillade of dangling index errors combined with relocation shards. I set cluster.routing.allocation.enable=none and that's how I was able to get the cluster back up so I could eventually deal with the dangling index errors.
However, the cluster crashed again and will not come back up. The cluster starts back and then the _cluster/health?pretty queries slow way down over the course of 10 minutes and then the master node heartbeats time out.
I thought the issue could be GC pauses due to the dangling index errors, but that's not the case.