I have a 3 node cluster running ES 1.0.1 in Azure. They're windows VMs
with 7GB of RAM. The JVM heap size is allocated at 4GB per node. There is
a single index in the cluster with 50 shards and 1 replica. The total
number of documents on primary shards is 29 million with a store size of
60gb (including replicas).
Almost every day now I get a random node disconnecting from the cluster.
The usual suspect is a ping timeout. The longest GC in the logs is about 1
sec, and the boxes don't look resource constrained really at all. CPU never
goes above 20%. The used JVM heap size never goes above 6gb (the total on
the cluster is 12gb) and the field data cache never gets over 1gb. The
node that drops out is different every day. I have
minimum_number_master_nodes set so there's not any kind of split brain
scenario, but there are times where the disconnected node NEVER rejoins
until I bounce the process.
Has anyone seen this before? Is it an Azure networking issue? How can I
tell? If it's resource problems, what's the best way for me to turn on
logging to diagnose them? What else can I tell you or what other steps can
I take to figure this out? It's really quite maddening
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8f85c254-9d53-4507-a340-4c8f2a4a078d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.