Nodes not discovering each other


(Shashankvivek) #1

We tried with unicast.

discovery.zen.ping.unicast.hosts: ["hostname:9200", "hostname:9200"]

Even tried to set 1 node as master using node.master=true
Still getting following error log
Both nodes are in same network
We are able to ping nodes from each other as well as telnet to 9200 is working.
Can somebody please help here?

[2015-12-18 00:52:45,071][WARN ][discovery.zen.ping.unicast] [plab_node_201] failed to send ping to [{#zen_unicast_2#}{xx.xx.xx.xxx}{hostname/xx.xx.xx.xxx:9200}]
ReceiveTimeoutTransportException[[][hostname/xx.xx.xx.xxx:9200][internal:discovery/zen/unicast] request_id [389] timed out after [3750ms]]

This topic is similar to but do not have solution
Failed to send ping to


(Yodog) #2

can you telnet from node-01 to node-02 on ports 9200 and 9300?

telnet xx.xx.xx.xxx 9200
telnet xx.xx.xx.xxx 9300

if you're not sure about what i'm talking about, execute the above command and copy/paste the result here.


ah, you already said that telnet is working... my bad.


(Shashankvivek) #3

Yes, Telnet is working on those nodes at port 9200 & 9300


(Yodog) #4

have you tried without specifying ports?

my elasticsearch.yml is like this:

# Elasticsearch nodes will find each other via unicast, by default.
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]

discovery.zen.ping.unicast.hosts: ["elastic-node-01", "elastic-node-02", "elastic-node-03"]

also, i think the node.master=true param does not set the node as master, only makes it eligible as master.
meaning you can have this parameter on both nodes.

i might be wrong.


(Shashankvivek) #5

After removing ports

[2015-12-18 03:48:59,410][INFO ][discovery.zen ] [plab_node_201] failed to send join request to master [{plab_node_202}{rWxRMAyWQBi0eEiBmVcdJg}{127.0.0.1}{127.0.0.1:9300}{master=true}], reason [RemoteTransportException[[plab_node_201][127.0.0.1:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{plab_node_201}{EgEo2WlTTBWqAAh6FwN86w}{127.0.0.1}{127.0.0.1:9300}{master=false}] not master for join request]; ]


(Yodog) #6

now set node.master=true on both hosts and restart the services.


(Shashankvivek) #7

same error :frowning:


(Yodog) #8

can you post your elasticsearch.yml somewhere like http://pastebin.com and link it here?


(Ralph LeVan) #9

I had exactly the same problem. Have you set the hostname for each host in the yaml file? The comments for that field don't make it look critical, but it is.


(Shashankvivek) #10

The issue was of multiple network interfaces failing discovery. By publishing specific ip address it got resolved. network.publish_host and network.bind_host in conf file was set to its own IP.

network.publish_host: "xx.xx.xx.xxx"
network.bind_host: "xx.xx.xx.xxx"


(Saravanakumar N) #11

It could be due to multiple instance of ES running on same machine, a corrupted/killed ES has not released the port, check the instances running by executing jps, if more than one is running, kill the one which is not releasing the port

I had the same issue, out of 12 indices only 8 recovered, checked the running ES instances, killed the one not terminated properly, it worked

Regards,
Saravana


(tomer zaks) #12

Hi,

I didnt understand your answer, I dont have network.publish* or .bind_host...
what is this?


(system) #13