Every few weeks, my ElasticSearch nodes suddenly take on a "red" status. This is on my production servers and I rely heavily on ES, and the only solution I've been able to come up with is to delete all the data and restart ElasticSearch (which sucks, needless to say). Simply restarting ES doesn't work:
# curl http://localhost:9200/_cluster/health {"cluster_name":"streamified","status":"red","timed_out":false,"number_of_nodes":2,"number_of_data_nodes":2,"active_primary_shards":10,"active_shards":20,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":10}
I'm hoping someone can help me make sense of the logs; they're a mystery to me. I'm running 2 ElasticSearch machines with 4GB of RAM allocated (and I've increased max page size to 64k). I'm on EC2 (using ec2 discovery). Here are the entire contents of the logs: http://streamified.com/StreamifiedESLogs.zip