Unicast cluster - nodes fail to connect v2.1.0


(Karen Chenette) #1

I am trying to set up a new v2.1 unicast cluster but each node fails to see any of the other named nodes in the cluster. There is no data, no current load on these windows 2012 servers. I have scaled it back to two nodes to test. What am I missing? This is an example 'elasticsearch.yml' :

cluster.name: CPS-PD
node.name: "chdp-es01"
network.host: 10.1.2.191
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.1.2.191", "10.1.2.192"]`

I set discovery to TRACE and I don't even see a ping response from the other node and it fails to connect every 30s. There is a strange looking string "{B7JRjusgQ_6bP1YTg1nBAA}" associated with the current node :

[2015-12-23 17:12:06,583][TRACE][discovery.zen.ping.unicast] [chdp-es01] [1] sending to {chdp-es01}{B7JRjusgQ_6bP1YTg1nBAA}{10.1.2.191}{10.1.2.191:9300}
[2015-12-23 17:12:06,692][TRACE][discovery.zen.ping.unicast] [chdp-es01] [1] received response from {chdp-es01}{B7JRjusgQ_6bP1YTg1nBAA}{10.1.2.191}{10.1.2.191:9300}: [ping_response{node [{chdp-es01}{B7JRjusgQ_6bP1YTg1nBAA}{10.1.2.191}{10.1.2.191:9300}], id[1], master [null], hasJoinedOnce [false], cluster_name[CPS-PD]}, ping_response{node [{chdp-es01}{B7JRjusgQ_6bP1YTg1nBAA}{10.1.2.191}{10.1.2.191:9300}], id[2], master [null], hasJoinedOnce [false], cluster_name[CPS-PD]}]
[2015-12-23 17:12:06,708][TRACE][discovery.zen.ping.unicast] [chdp-es01] [1] connecting (light) to {#zen_unicast_2#}{10.1.2.192}{10.1.2.192:9300}
[2015-12-23 17:12:11,251][TRACE][discovery.zen            ] [chdp-es01] full ping responses: {none}
[2015-12-23 17:12:11,251][TRACE][discovery.zen.ping.unicast] [chdp-es01] [1] disconnecting from {#zen_unicast_2#}{10.1.2.192}{10.1.2.192:9300}
[2015-12-23 17:12:11,251][DEBUG][discovery.zen            ] [chdp-es01] filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2015-12-23 17:12:11,251][DEBUG][discovery.zen            ] [chdp-es01] elected as master, waiting for incoming joins ([0] needed)
[2015-12-23 17:12:11,267][INFO ][cluster.service          ] [chdp-es01] new_master {chdp-es01}{B7JRjusgQ_6bP1YTg1nBAA}{10.1.2.191}{10.1.2.191:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2015-12-23 17:12:11,267][TRACE][discovery.zen            ] [chdp-es01] stopping join accumulation ([election closed])
[2015-12-23 17:12:11,267][TRACE][discovery.zen            ] [chdp-es01] cluster joins counter set to [1] (elected as master)
[2015-12-23 17:12:11,329][INFO ][http                     ] [chdp-es01] publish_address {10.1.2.191:9200}, bound_addresses {10.1.2.191:9200}
[2015-12-23 17:12:11,329][INFO ][node                     ] [chdp-es01] started
[2015-12-23 17:12:11,345][INFO ][gateway                  ] [chdp-es01] recovered [0] indices into cluster_state
[2015-12-23 17:12:27,753][TRACE][discovery.zen.ping.unicast] [chdp-es01] [1] failed to connect to {#zen_unicast_2#}{10.1.2.192}{10.1.2.192:9300}
ConnectTransportException[[][10.1.2.192:9300] connect_timeout[30s]]; nested: ConnectException[Connection timed out: no further information: /10.1.2.192:9300];

(Mark Walkom) #2

Is it a firewall issue?


(Karen Chenette) #3

I do not believe it is a firewall issue. I can reach either server via CURL and do _cluster/health and receive a response. I can ping both machines.


(Mark Walkom) #4

Can you do those from each of the two nodes to the other one?


(Karen Chenette) #5

yes I can do curl from either node to the other node


(Mark Walkom) #6

Weird, are you starting both nodes at the same time, or one and then the other?


(Karen Chenette) #7

One and then the other...


(Karen Chenette) #8

And... it turned out to be a firewall configuration problem. I don't know why 9200 was open and 9300 was blocked. pretty weird.


(system) #9