Split brains after long GCs

Hi,
I have a 6 nodes cluster running ES 0.20.5, the cluster currently have
around 35M docs spread over 12 shards with 1 replica.
each node has 48GB with 24GB Heap.

I am experiencing at random time spike in heap usage followed by long GC.
after the nodes finished the long GC cluster is getting into split brain
situation with weird states where one of the cluster nodes is member of
both sides of the split.
minimum_master_nodes option does not help in this case since the node
exists in two of the different cluster states.

I appreciate any suggestion you can have to prevent this issues, especially
the split brain since it causes corruption of our index.

Thanks
Asher

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The minimum_master_nodes should be equal to the total number of nodes in
the cluster. If you don't you can end up with multiple nodes as being
masters. Also, you should look at changing the following values:

Our configuration:

discovery.zen.ping.timeout: 30s
discovery.zen.fd.ping_interval: 10s
dicsovery.zen.fd.ping_timeout: 30s
discovery.zen.fd.ping_retries: 480

When the master stops communicating and there is the beginnings of a
split-brain, the entire cluster stops talking to the world. You cannot
index/query etc. The default time is 1 minute. The above properties will
allow you to extend that time out. In our configuration it's up to 80
minutes. This should never happen ( 80 minutes ) but if it does, you have a
BIGGER problem going on. Also with this configuration if it were to get
into 80 minutes, eventually all shards get put into a state where they
reject all operations ( An exception I just do not recall ) on them and the
entire cluster has to be rebooted. This also protects from have indexes
that are out of whack and then trying to recover becomes hell. We had a
split-brain where one node had 600k docs and another node had 6k. The
mistake by the admin was to restart both nodes. Node that came up first
wins. Guess which one won! Yep, 6k. We lost the 600k.
On Mar 17, 2013 12:54 PM, "asher frenkel" asher.frenkel@gmail.com wrote:

Hi,
I have a 6 nodes cluster running ES 0.20.5, the cluster currently have
around 35M docs spread over 12 shards with 1 replica.
each node has 48GB with 24GB Heap.

I am experiencing at random time spike in heap usage followed by long GC.
after the nodes finished the long GC cluster is getting into split brain
situation with weird states where one of the cluster nodes is member of
both sides of the split.
minimum_master_nodes option does not help in this case since the node
exists in two of the different cluster states.

I appreciate any suggestion you can have to prevent this issues,
especially the split brain since it causes corruption of our index.

Thanks
Asher

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

First aid: try increase

discovery.zen.ping.timeout: 60s (default: 3s)

and the Zen fault detection

discovery.zen.fd.ping_timeout: 60s (default: 30s)
discovery.zen.fd.ping_interval: 60s (default: 1s)

so the communication to the master node may take enough time to survive
certain GC stalls. This is just a workaround - I don't know how long
your GC stalls and how long a node can not respond. It is important to
analyze these numbers.

The default zen ping timout values are selected very carefully. They
assume all nodes in the cluster are alive and can respond. But if you
increase the timeout, you allow nodes may not respond, and that
assumption will influence the cluster, long response times may be the
consequence. It's just "sugar coating" the real problem. So increasing
timeout is not a proper solution.

Some hints to get closer to the cause:

  • check your code for the reason why your code can create "spikes" so GC
    must step in and run into JVM stall situations. There are situations
    where CMS GC can be improved, by avoiding edge cases, but it does not
    always work out.

  • if you must accept the "spikes" and large heaps and CMS GC can't be
    improved, try another GC algorithm which is optimized for large heaps
    and short stall times (G1 GC). Note, G1 GC is not default GC now in Java
    7, and is not stable. G1 takes more CPU and decreases overall
    performance. It does not prevent spikes but it let the JVM respond
    within small time frames, the JVM is more reactive.

  • if all GC improvement strategies fail, consider a smaller heap per
    node - for example, more nodes and less heap size - so the "spikes" do
    not hurt so much

Jörg

Am 17.03.13 13:54, schrieb asher frenkel:

Hi,
I have a 6 nodes cluster running ES 0.20.5, the cluster currently have
around 35M docs spread over 12 shards with 1 replica.
each node has 48GB with 24GB Heap.

I am experiencing at random time spike in heap usage followed by long GC.
after the nodes finished the long GC cluster is getting into split
brain situation with weird states where one of the cluster nodes is
member of both sides of the split.
minimum_master_nodes option does not help in this case since the node
exists in two of the different cluster states.

I appreciate any suggestion you can have to prevent this issues,
especially the split brain since it causes corruption of our index.

Thanks
Asher

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.