Client node unable to discover master node after master restarts and gets new IP

Sonia_Gupta · July 19, 2018, 12:05am

I have a 1 client/ 2 data/ 3 master Elasticsearch 6.2.2 cluster spread over 3 node vpshere environment. This is a kubernetes system. Data and master are statefulsets. Master has a service sitting on it acting as the discovery service. The installation is successful and we can see the data in Kibana, however, If all 3 masters happen to go down together, they come up with new IPs and discover each other, but data and client cant seem to discover the new masters, the logs tell me they are still trying to hit the old IPs.

Master log:
[2018-07-18T23:13:52,533][INFO ][o.e.b.BootstrapChecks ] [platform-elasticsearch-master-0] bound or publishing to a non-loopback address, enforcing bootstrap checks [2018-07-18T23:13:52,943][INFO ][o.e.m.j.JvmGcMonitorService] [platform-elasticsearch-master-0] [gc][1] overhead, spent [372ms] collecting in the last [1s] [2018-07-18T23:13:55,890][INFO ][o.e.c.s.ClusterApplierService] [platform-elasticsearch-master-0] detected_master {platform-elasticsearch-master-1}{Or4qtPKjRyajMI-iC9mnwg}{Aw3yHwDhSOu3DiocgUParg}{172.24.0.99}{172.24.0.99:9300}, added {{platform-elasticsearch-client-78d74649fc-7976p}{TkuEXJttR1C93sOuYheVTg}{6vsOtJxFTC6AdaPzNrJNbQ}{172.24.2.212}{172.24.2.212:9300},{platform-elasticsearch-master-2}{bpIqzNyKS86Sk8vPRHSy7A}{RD3XUV1TTKaD1cjOSuxf7g}{172.24.1.104}{172.24.1.104:9300},{platform-elasticsearch-data-0}{cpsl1MgFT0-c15IqgVyX7w}{hVWwSzwuQ6arlVVDEO_JkQ}{172.24.2.214}{172.24.2.214:9300},{platform-elasticsearch-data-1}{CKsi0wfrQtC421UuhS4EUQ}{tn-w2myFQkmN1nV1kaHB0g}{172.24.1.105}{172.24.1.105:9300},{platform-elasticsearch-master-1}{Or4qtPKjRyajMI-iC9mnwg}{Aw3yHwDhSOu3DiocgUParg}{172.24.0.99}{172.24.0.99:9300},}, reason: apply cluster state (from master [master {platform-elasticsearch-master-1}{Or4qtPKjRyajMI-iC9mnwg}{Aw3yHwDhSOu3DiocgUParg}{172.24.0.99}{172.24.0.99:9300} committed version [2]]) [2018-07-18T23:13:56,205][INFO ][o.e.n.Node ] [platform-elasticsearch-master-0] started

Client logs:
[2018-07-18T22:49:34,640][WARN ][o.e.d.z.ZenDiscovery ] [platform-elasticsearch-client-5d5cff75d9-k29x7] not enough master nodes discovered during pinging (found [[]], but needed [2]), pinging again [2018-07-18T22:49:35,635][WARN ][o.e.c.NodeConnectionsService] [platform-elasticsearch-client-5d5cff75d9-k29x7] failed to connect to node {platform-elasticsearch-master-1}{IPe2h_kETombpzD2ZSnetA}{ACKXkGpVQ8CMv5td3tAHxQ}{172.24.1.101}{172.24.1.101:9300} (tried [7] times) org.elasticsearch.transport.ConnectTransportException: [platform-elasticsearch-master-1][172.24.1.101:9300] connect_timeout[30s] at org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:163) ~[elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:616) ~[elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:513) ~[elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:331) ~[elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:154) [elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.cluster.NodeConnectionsService$ConnectionChecker.doRun(NodeConnectionsService.java:183) [elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) [elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.2.2.jar:6.2.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]

master/client config:
cluster.name: ${CLUSTER_NAME:true}
node.name: ${NODE_NAME:}
node.master: ${NODE_MASTER:}
node.data: ${NODE_DATA:}
node.ingest: ${NODE_INGEST:}
network.host: ${NETWORK_HOST:0.0.0.0}
path:
data: /usr/share/elasticsearch/data
logs: /usr/share/elasticsearch/logs
bootstrap:
memory_lock: ${MEMORY_LOCK:false}
http:
enabled: ${HTTP_ENABLE:false}
discovery:
zen:
ping.unicast.hosts: ${DISCOVERY_SERVICE:}
minimum_master_nodes: ${MINIMUM_NUMBER_OF_MASTERS:1}
commit_timeout: 60s
publish_timeout : 60s
gateway.expected_nodes: 5
gateway.expected_master_nodes: 3
gateway.expected_data_nodes: 2
gateway.recover_after_nodes: 2
gateway.recover_after_master_nodes: 2
gateway.recover_after_data_nodes: 1

NOTE: Restarting the client fixes the issue. But why is this happening?
Please help!

zqc0512 · July 19, 2018, 3:08am

discovery.zen.ping.unicast.hosts:
need add the new ip .

Sonia_Gupta · July 19, 2018, 3:54am

I have a service that returns the IPs of all masters, Client is able to get them successfully during startup, but not again when the masters change the IPs, when I ping the service, I get new IPs

system · August 16, 2018, 3:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Client Node cannot be discovered by Master Elasticsearch	3	1542	July 5, 2017
Error in promotion to master? Elasticsearch	6	379	July 6, 2017
Node discovery broken? Elasticsearch	3	481	July 6, 2017
New nodes do not consistently find existing master Elasticsearch	2	269	July 6, 2017
Autodiscovery broken? Elasticsearch	2	282	July 6, 2017

Client node unable to discover master node after master restarts and gets new IP

Related topics