Failed shards remain unassigned

I have set up elasticsearch 0.90.2 on 4 linux VMs (ny01, ny02, nj01, nj02).
There are 4 shards and 3 replicas for each (each node should have all 4
shards). There is one index (we'll call it my_index). Settings that we have
modified from the defaults:
discovery.zen.ping.multicast.enabled: false
discovery.zen.minimum_master_nodes: 3
cluster.routing.allocation.awareness.attributes: datacenter

Today I noticed that nj01 had lost 2 shards and nj02 had lost 1 shard.
Looking at the logs, I see that after some disconnects around 2013-08-04
07:00, ny01 tried to send the cluster state to all nodes. It looks like the
correct cluster state was never fully recovered and these 3 shards remain
unassigned. This is the logging from nj01 starting at that time.

Any suggestions how to debug why the full state was not recovered?

I have pasted the logs during that time from one of the problematic nodes
(nj01) and from the master node at the time (ny01):
ny01: https://gist.github.com/takism1/312e2bf2d0ff583a235f
nj01: https://gist.github.com/takism1/0162dda53b4806d7a372

I can post additional information if needed.

Thanks,
Takis

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.