Failed to add a new node to a single node cluster on the fly


(Sayakiss) #1

We have an one node cluster for a long time, and the old node is crawler_service_001(172.16.11.17). We want to add a new node to that cluster(don't restart crawler_service_001), and named it crawler_service_002(172.16.11.16).

But when I trying to setup crawler_service_002, the logs gives:

[2015-12-01 19:04:59,750][INFO ][discovery.zen            ] [crawler_service_002] failed to send join request to master [{crawler_service_001}{rCqrCf10SGuAVqW_YYiDng}{127.0.0.1}{127.0.0.1:9300}], reason [RemoteTransportException[[crawler_service_002][127.0.0.1:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{crawler_service_002}{1DoKt04tQAGFiaUIAzkdOA}{127.0.0.1}{127.0.0.1:9300}{master=false}] not master for join request]; ]
[2015-12-01 19:05:02,764][INFO ][discovery.zen            ] [crawler_service_002] failed to send join request to master [{crawler_service_001}{rCqrCf10SGuAVqW_YYiDng}{127.0.0.1}{127.0.0.1:9300}], reason [RemoteTransportException[[crawler_service_002][127.0.0.1:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{crawler_service_002}{1DoKt04tQAGFiaUIAzkdOA}{127.0.0.1}{127.0.0.1:9300}{master=false}] not master for join request]; ]
[2015-12-01 19:05:05,773][INFO ][discovery.zen            ] [crawler_service_002] failed to send join request to master [{crawler_service_001}{rCqrCf10SGuAVqW_YYiDng}{127.0.0.1}{127.0.0.1:9300}], reason [RemoteTransportException[[crawler_service_002][127.0.0.1:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{crawler_service_002}{1DoKt04tQAGFiaUIAzkdOA}{127.0.0.1}{127.0.0.1:9300}{master=false}] not master for join request]; ]
[2015-12-01 19:05:08,785][INFO ][discovery.zen            ] [crawler_service_002] failed to send join request to master [{crawler_service_001}{rCqrCf10SGuAVqW_YYiDng}{127.0.0.1}{127.0.0.1:9300}], reason [RemoteTransportException[[crawler_service_002][127.0.0.1:9300][internal:discovery/zen/join]]; nested: IllegalStateException[Node [{crawler_service_002}{1DoKt04tQAGFiaUIAzkdOA}{127.0.0.1}{127.0.0.1:9300}{master=false}] not master for join request]; ]

elasticsearch.yml of crawler_service_002(new node):

cluster.name: elasticsearch_dc_001
node.name: crawler_service_002

network.bind_host: 0.0.0.0
transport.tcp.port: 9300
transport.publish_port: 9300
node.master: false

discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["172.16.11.16", "172.16.11.17"]

I tried change node.master: false to true, but still result the same error logs.

elasticsearch.yml of crawler_service_001 (old node):

cluster.name: elasticsearch_dc_001
node.name: crawler_service_001

network.bind_host: 0.0.0.0
transport.tcp.port: 9300
transport.publish_port: 9300

Is there any mistake in my configuration file? Or some other reason?


(Sayakiss) #2

After a wild google and checked everything carefully, finally I noticed the log: {crawler_service_001}{rCqrCf10SGuAVqW_YYiDng}{127.0.0.1}{127.0.0.1:9300}

That's really strange, in cralwer_service_002's opinion, the address of crawler_service_001 is 127.0.0.1!

I guess, cralwer_service_002 used the ip address in unicast hosts found cralwer_service_001, but cralwer_service_001 didn't know its publish address itself and tell cralwer_service_002, "Hi new guys, I'm cralwer_service_001 from 127.0.0.1!".

So I add network.host: <their_ip> to both node configuration, and restarted the whole cluster, it worked.


(Steve L) #3

Thank you, I was having the exact same problem and this resolved it. I thought I had configured everything properly with network.bind_host...

network.bind_host: "0.0.0.0"

..but the logs indicated that discovery was failing and was using 127.0.0.1 for discovery. I added the following to each of my nodes and they discovered each other right away.

network.host: "0.0.0.0"

Note: This is on EC2 and I'd already set up the security group that enables the nodes to query EC2 for other nodes.


(tomer zaks) #4

my log look better but I still have a question in mine I get

{node_name}{rCqrCf10SGuAVqW_YYiDng}{another_String}{good_ip}{same_as_good_ip:9300}

Is the other "another_String" a problem?


(system) #5