8 node cluster running 0.20.0RC1, MINIMUM_MASTER_NODES is set to 5.
At a certain point, 2 nodes (search7 and search8) left the cluster. The
reason is unknown, it occurred while increasing the replica count on a new
index, but I am not focused on that right now. Stopped the process on both
search7 and search8 and started them up one at a time.
Upon restarting search7, it seemed to think that search8 was the master.
Since the process was down, it did not join the cluster.
[2013-02-25 09:09:05,108][WARN ][discovery.zen ] [search7]
failed to connect to master
[[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]],
retrying...
org.elasticsearch.transport.ConnectTransportException:
[search8][inet[/ipaddress:9300]] connect_timeout[5s]
Next I started search8 and attempted to restart search7. Ignoring
search8's logs for now, search7 now cannot join the cluster for other
reasons (not master):
[2013-02-25 09:10:47,816][INFO ][discovery.zen ] [search7]
failed to send join request to master
[[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]], reason
[org.elasticsearch.transport.RemoteTransportException:
[search8][inet[/ipaddress:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node
[[search8][feMQtDFOTs2xyh0xcTXdkA][inet[/ipaddress:9300]]] not master
for join request from
[[search7][4lEhStfwQDKuHpGvHeU-hQ][inet[/ipaddress:9300]]]]
[2013-02-25 09:10:47,816][TRACE][discovery.zen ] [search7]
detailed failed reason
org.elasticsearch.transport.RemoteTransportException:
[search8][inet[/ipaddress:9300]][discovery/zen/join]
Caused by: org.elasticsearch.ElasticSearchIllegalStateException: Node
[[search8][feMQtDFOTs2xyh0xcTXdkA][inet[/ipaddress:9300]]] not master
for join request from
[[search7][4lEhStfwQDKuHpGvHeU-hQ][inet[/ipaddress:9300]]]
More logging output is at
Focusing only on search7 for now. It is sending ping requests to all nodes
in the network cluster, and they all seem to respond that search8 is the
master. The other 6 nodes are forming an ES cluster, without search8 as the
master. Why are they returning search8 as the master?
If this is a split brain scenario, why didn't setting the minimum master
nodes help? How can someone recover from this scenario? We deleted the new
index, and the cluster returned to a green state. I assume that deleting
the data directories on search7 and search8 would have made the cluster go
into a yellow state. What does it take for the master election process to
start?
Cheers,
Ivan
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.