Hi,
we're test running a 3 node cluster of Elasticsearch, version 5.6.9.
There seems to be a problem when the master node becomes unreachable by the other 2 nodes (for example, when the network cable gets pulled). The remaining nodes take about 1 minute or longer to recover. During this time, the cluster is not reachable for requests.
We're testing reachability as follows:
while true; do timeout 1 curl --connect-timeout 1 -XGET http://10.231.83.146:9202/_cluster/health; echo ""; date; sleep 1; done
When the master leaves, no curl requests are answered for about 1 minute. We ran these tests several times. It can take up to 2 minutes before the cluster becomes responsive again.
My question to you:
- Is this normal/to be expected?
- If not, is there a way to fix it?
Thanks for any pointers in advance!
Here are the configurations:
Node 1:
cluster.name: testcluster
network.host: 10.231.83.238
http.enabled: true
http.port: 9200
transport.tcp.port: 9300
discovery.zen.ping.unicast.hosts: ["10.231.83.238:9300", "10.231.83.146:9301", "10.231.83.146:9302"]
discovery.zen.minimum_master_nodes: 2
discovery.zen.fd.ping_timeout: 4s
http.cors.enabled: true
http.cors.allow-origin: /https?:\/\/localhost(:[0-9]+)?/
Node 2:
cluster.name: testcluster
network.host: 10.231.83.146
http.enabled: true
http.port: 9201
transport.tcp.port: 9301
discovery.zen.ping.unicast.hosts: ["10.231.83.238:9300", "10.231.83.146:9301", "10.231.83.146:9302"]
discovery.zen.minimum_master_nodes: 2
discovery.zen.fd.ping_timeout: 4s
http.cors.enabled: true
http.cors.allow-origin: /https?:\/\/localhost(:[0-9]+)?/
Node 3:
cluster.name: testcluster
network.host: 10.231.83.146
http.enabled: true
http.port: 9202
transport.tcp.port: 9302
discovery.zen.ping.unicast.hosts: ["10.231.83.238:9300", "10.231.83.146:9301", "10.231.83.146:9302"]
discovery.zen.minimum_master_nodes: 2
discovery.zen.fd.ping_timeout: 4s
http.cors.enabled: true
http.cors.allow-origin: /https?:\/\/localhost(:[0-9]+)?/
Here is an example log output. Please note that Node 1 is master before we pull the network cable. Because I'm not allowed to post more than 7000 characters, I've put them on pastebin.
Node 1: https://pastebin.com/TCa2Wuia
Node 2: https://pastebin.com/7sKv0Z0z
Node 3: https://pastebin.com/FzGvT7Ji