im doing crash test of small elasticsearch cluster):
3 ubuntu micro instance (EC2, 3 zones)
2 replicas (one per zone)
10 indexes (with 10 shards per index)
30k documents indexed in bulks (100 per batch) in parallel on every node
swap disabled
im using micro instances to simulate a lot of crashes (i want to see
cluster recovery in action)
as could be expected it is crashing under load quite often (java process
running out of memory)
overall it is working surprisingly well (no data loss as for now)
the only annoying thing is, that sometimes some shard gets stuck in *INITIALIZING
*state (and _cluster/health shows "yellow")
i left cluster running for the night but it didnt recovered
also restarting node with misbehaving shard didnt helped (it stuck in
INITIALIZING state after restart too)
Can you find out if the initializing shards were stuck because of a
previous OOM? If so, there is not much that can be done except a node cold
restart (JVM shutdown and start).
Sorry, I just see tat you already restarted the node...
Is there something in the logs? At debug level? The cluster should tell
about if it receives the shard at all, and maybe the reason why it rejects
the shard.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.