Simulating network connect failure between nodes

waqark3389 · November 30, 2016, 5:29pm

I have 5 nodes, all eligible as master and all data nodes.discovery.zen.minimum_master_nodes is set to 3:

node1,node2,node3,node4 and node5. Everything else is default.

So I tried to simulate a network problem between node5 and the cluster. So I blocked (via firewall) outgoing from node5 to all other nodes and blocked incoming to node5 from the other nodes.
I should see node5 has been removed from cluster but instead I cannot do any POST/GET to ANY node in the cluster. On the master, node3, I can see timeouts connecting to node5.

Querying any node including node5 (the isolated node) e.g. node1:9200/_cluster/health is retuning green and it thinks it has 5 nodes. After about 30 minutes the nodes finally realise node5 is not responding and they remove him from the pool.

The whole cluster went down for over 30 minutes in one node is isolated. I can see the following on node3 logs repeated every minute, who is master at this time:

[node3] failed to execute on node [VdSa2w0tSwiUyNNFhzvNXg]
org.elasticsearch.transport.NodeDisconnectedException: [node5][172.16.99.234:9300][cluster:monitor/nodes/stats[n]] disconnected

Why is it the cluster is unusable when a node is isolated? What can I do to speed up the process of recovery?

Here is the config which is the same on all 5 nodes except name:

cluster.name: elasticsearchTestCluster
node.name: node3
http.cors.enabled: true
http.cors.allow-origin: "*"
network.host: 0.0.0.0

discovery.zen.ping.unicast.hosts: ["172.16.99.230", "172.16.99.231", "172.16.99.232", "172.16.99.233", "172.16.99.234"]
discovery.zen.minimum_master_nodes: 3

path.data: /home/elasticsearchData

dadoonet · December 4, 2016, 6:29am

Thanks for reporting.

Can you add here the cluster state before the node is isolated and after.

system · January 1, 2017, 6:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster recovery and reachability takes long time when master left Elasticsearch	11	2558	March 19, 2019
Elasticsearch 6.1.3 -- failed to discover master after node restart Elasticsearch	6	1240	April 27, 2018
Master node hangs when multiple data nodes are shutdown at the same time Elasticsearch	6	954	July 6, 2017
2 Nodes ES cluster becomes unavailable for 2 -3 mins if one node (master) goes down Elasticsearch	11	3674	July 5, 2017
Discovery_zen disconnect issues Elasticsearch	5	382	July 6, 2017

Simulating network connect failure between nodes

Related topics