Hi,
Recently we discovered that Elasticsearch is not able to solve a previous
split brain situation of an existing cluster. The problem (split brain and
further resolution) can be splitted into two main parts:
- Reorganization of the whole cluster and logging
- Resolution of data conflicts
The first thing should be fairly "easy" to solve. Discovery should take
place regularly and update the cluster organization if necessary.
The second thing would be more complex and dependent of what users are
doing. In our application it is not that important that conflicts caused by
split brain is solved by Elasticsearch - we can easily handle this
(re-import the data modified while the split brain situation).
IMHO it is much better to let ES solve the split brain than to let it run
"forever" in the split brain situation.
From the original issue
https://github.com/elasticsearch/elasticsearch/issues/5144 :
we have a 4 node ES cluster running ("plain" Zen discovery - no cloud
stuff). Two nodes are in one DC - two nodes in another DC.
When the network connection between both DCs fails, ES forms two two-node
ES clusters - a split brain. When the network is operative again, the split
brain situation is remains persistent.
I've setup a small local test with a 4 node ES cluster:
+--------+ +--------+
| Node A | ----\ /---- | Node C |
+--------+ .........../ +--------+
+--------+ / \ +--------+
| Node B | ----/ ---- | Node D |
+--------+ +--------+
Single ES cluster
When the network connection fails, two two node clusters exists (split
brain). I've simulated that with "iptables -A INPUT/OUTPUT -s/d -j DROP"
statements.
+--------+ +--------+
| Node A | ----\ /---- | Node C |
+--------+ \ / +--------+
+--------+ / \ +--------+
| Node B | ----/ ---- | Node D |
+--------+ +--------+
ES cluster ES cluster
When the network between nodes AB and CD is operative again, the single
cluster status is not restored (split brain is persistent).
It did not make a difference, whether unicast or multicast ZEN discovery is
used.
Another issue is that operating system keepalive settings affects the time
after which ES detects a node failure. Keepalive timeout settings (e.g.
net.ipv4.tcp_keepalive_time/probes/intvl) directly influence the node
failure detection.
There should be some task, that regularly polls the "alive" status of all
known other nodes.
Tested with ES 1.0.0 (and an older 0.90.3).
David Pilato: "Did you try to set minimum_master_node to 3? See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#master-election
"
Me: "Setting minimum_master_nodes to 3 is not an option. If I understand
correctly, it would force all 4 nodes to stop working at all - means: no
service at all. This wouldn't cover the case, that two nodes are taken down
for maintenance work. And what if there a three DCs (each with 2 nodes) - a
setting of minimum_master_nodes=5 would only allow one node to fail before
ES stops working. IMHO there should be a regular job inside ES, that checks
the existence of other nodes (either via unicast or via multicast) and
triggers (re-)discovery if necessary - the split brain situation must be
resolved."
David Pilato: "Exactly. Cluster will stop working until network connection
is up again.
What do you expect? Which part of the cluster should hold the master in
case of network outage?
Cross Data center replication is not supported yet and you should consider:
- use the great snapshot and restore feature to snapshot from a DC and
restore in the other one - index in both DC (so two distinct clusters) from a client level
- use Tribe node feature to search or index on multiple clusters
I think we should move this conversation to the mailing list."
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1cc24862-5a95-4e2e-9dc4-6d8d5445b016%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.