Client node unable to discover master node after master restarts and gets new IP


(Sonia Gupta) #1

I have a 1 client/ 2 data/ 3 master Elasticsearch 6.2.2 cluster spread over 3 node vpshere environment. This is a kubernetes system. Data and master are statefulsets. Master has a service sitting on it acting as the discovery service. The installation is successful and we can see the data in Kibana, however, If all 3 masters happen to go down together, they come up with new IPs and discover each other, but data and client cant seem to discover the new masters, the logs tell me they are still trying to hit the old IPs.

Master log:
[2018-07-18T23:13:52,533][INFO ][o.e.b.BootstrapChecks ] [platform-elasticsearch-master-0] bound or publishing to a non-loopback address, enforcing bootstrap checks [2018-07-18T23:13:52,943][INFO ][o.e.m.j.JvmGcMonitorService] [platform-elasticsearch-master-0] [gc][1] overhead, spent [372ms] collecting in the last [1s] [2018-07-18T23:13:55,890][INFO ][o.e.c.s.ClusterApplierService] [platform-elasticsearch-master-0] detected_master {platform-elasticsearch-master-1}{Or4qtPKjRyajMI-iC9mnwg}{Aw3yHwDhSOu3DiocgUParg}{172.24.0.99}{172.24.0.99:9300}, added {{platform-elasticsearch-client-78d74649fc-7976p}{TkuEXJttR1C93sOuYheVTg}{6vsOtJxFTC6AdaPzNrJNbQ}{172.24.2.212}{172.24.2.212:9300},{platform-elasticsearch-master-2}{bpIqzNyKS86Sk8vPRHSy7A}{RD3XUV1TTKaD1cjOSuxf7g}{172.24.1.104}{172.24.1.104:9300},{platform-elasticsearch-data-0}{cpsl1MgFT0-c15IqgVyX7w}{hVWwSzwuQ6arlVVDEO_JkQ}{172.24.2.214}{172.24.2.214:9300},{platform-elasticsearch-data-1}{CKsi0wfrQtC421UuhS4EUQ}{tn-w2myFQkmN1nV1kaHB0g}{172.24.1.105}{172.24.1.105:9300},{platform-elasticsearch-master-1}{Or4qtPKjRyajMI-iC9mnwg}{Aw3yHwDhSOu3DiocgUParg}{172.24.0.99}{172.24.0.99:9300},}, reason: apply cluster state (from master [master {platform-elasticsearch-master-1}{Or4qtPKjRyajMI-iC9mnwg}{Aw3yHwDhSOu3DiocgUParg}{172.24.0.99}{172.24.0.99:9300} committed version [2]]) [2018-07-18T23:13:56,205][INFO ][o.e.n.Node ] [platform-elasticsearch-master-0] started

Client logs:
[2018-07-18T22:49:34,640][WARN ][o.e.d.z.ZenDiscovery ] [platform-elasticsearch-client-5d5cff75d9-k29x7] not enough master nodes discovered during pinging (found [[]], but needed [2]), pinging again [2018-07-18T22:49:35,635][WARN ][o.e.c.NodeConnectionsService] [platform-elasticsearch-client-5d5cff75d9-k29x7] failed to connect to node {platform-elasticsearch-master-1}{IPe2h_kETombpzD2ZSnetA}{ACKXkGpVQ8CMv5td3tAHxQ}{172.24.1.101}{172.24.1.101:9300} (tried [7] times) org.elasticsearch.transport.ConnectTransportException: [platform-elasticsearch-master-1][172.24.1.101:9300] connect_timeout[30s] at org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:163) ~[elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:616) ~[elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:513) ~[elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:331) ~[elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:154) [elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.cluster.NodeConnectionsService$ConnectionChecker.doRun(NodeConnectionsService.java:183) [elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) [elasticsearch-6.2.2.jar:6.2.2] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.2.2.jar:6.2.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]

master/client config:
cluster.name: ${CLUSTER_NAME:true}
node.name: ${NODE_NAME:}
node.master: ${NODE_MASTER:}
node.data: ${NODE_DATA:}
node.ingest: ${NODE_INGEST:}
network.host: ${NETWORK_HOST:0.0.0.0}
path:
data: /usr/share/elasticsearch/data
logs: /usr/share/elasticsearch/logs
bootstrap:
memory_lock: ${MEMORY_LOCK:false}
http:
enabled: ${HTTP_ENABLE:false}
discovery:
zen:
ping.unicast.hosts: ${DISCOVERY_SERVICE:}
minimum_master_nodes: ${MINIMUM_NUMBER_OF_MASTERS:1}
commit_timeout: 60s
publish_timeout : 60s
gateway.expected_nodes: 5
gateway.expected_master_nodes: 3
gateway.expected_data_nodes: 2
gateway.recover_after_nodes: 2
gateway.recover_after_master_nodes: 2
gateway.recover_after_data_nodes: 1

NOTE: Restarting the client fixes the issue. But why is this happening?
Please help!


(andy_zhou) #2

discovery.zen.ping.unicast.hosts:
need add the new ip .


(Sonia Gupta) #3

I have a service that returns the IPs of all masters, Client is able to get them successfully during startup, but not again when the masters change the IPs, when I ping the service, I get new IPs


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.