Shard stucked in initializing state (elasticsearch crash test)


(Karol Gwaj) #1

Hi,

im doing crash test of small elasticsearch cluster):

  • 3 ubuntu micro instance (EC2, 3 zones)
  • 2 replicas (one per zone)
  • 10 indexes (with 10 shards per index)
  • 30k documents indexed in bulks (100 per batch) in parallel on every node
  • swap disabled

im using micro instances to simulate a lot of crashes (i want to see
cluster recovery in action)
as could be expected it is crashing under load quite often (java process
running out of memory)

overall it is working surprisingly well (no data loss as for now)
the only annoying thing is, that sometimes some shard gets stuck in *INITIALIZING
*state (and _cluster/health shows "yellow")

i left cluster running for the night but it didnt recovered
also restarting node with misbehaving shard didnt helped (it stuck in
INITIALIZING state after restart too)

any suggestion how to fix this ?

Cheers,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b126ae21-ab16-4549-9331-5751f06fe496%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

Can you find out if the initializing shards were stuck because of a
previous OOM? If so, there is not much that can be done except a node cold
restart (JVM shutdown and start).

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoERPQEgpMAzwqNw9JpeUwpFWWu-qeqYVbtJ-%3DE-kRi%2BUw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #3

Sorry, I just see tat you already restarted the node...

Is there something in the logs? At debug level? The cluster should tell
about if it receives the shard at all, and maybe the reason why it rejects
the shard.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHeEcDz%3DhGv%2Bpb96Qk%3D3%3DDM-jnNpf26P%2BwKv7u%3DuE7%2Bzg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4