Child node never connected if offline at startup of tribe

Espen_Wang_Andreasse · August 11, 2015, 1:09pm

Hi,

Is it possible to get the tribe cluster/master to retry the initial connection to a child node/cluster, even after having established an connection to the rest of the tribe?

If the child node/cluster is offline when the tribe cluster starts, it logs this:
[2015-08-11 14:01:34,114][WARN ][discovery ] [devcluster-tribe/t2] waited for 30s and no initial state was set by the discovery

... gives up (continuing with the other child nodes/clusters) - and never tries to reconnect.

This makes the startup sequence of a tribe solution critical, as it requires that all sub-clusters are up and available when the tribe cluster starts.
We are hoping to tie queries across 12 regional data centers together by using tribe nodes, but this make it harder to do maintenance, as it will require a restart of tribe clusters if sub-clusters were unavailable upon startup.

Regards,
Espen Wang Andreassen

colings86 · August 11, 2015, 1:41pm

This looks like a bug to me. Thanks for raising it, I have opened an issue on Github to track this bug: https://github.com/elastic/elasticsearch/issues/12804

Espen_Wang_Andreasse · August 11, 2015, 8:50pm

Thanks your reply & for registering the bug!

-e

Istvan_Papp · August 27, 2015, 2:23pm

Hi Espen,

I am currently experimenting with Tribe nodes and I do not think this is the case. When i start up a remote cluster after the Tribe node is already running it does try discover it (although the attempt is not reflected well by the logs on the tribe node). When the remote cluster is started up i see this in the logs after recovery is completed:

[2015-08-27 10:16:41,051][INFO ][http ] [Ithil] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/...:9200]}
[2015-08-27 10:16:41,064][INFO ][node ] [Ithil] started
[2015-08-27 10:16:41,062][INFO ][gateway ] [Ithil] recovered [1] indices into cluster_state
[2015-08-27 10:16:41,588][INFO ][watcher ] [Ithil] watch service has started
[2015-08-27 10:16:48,103][INFO ][cluster.service ] [Ithil] added {[Tribemaster/t2][DuNkt5xiQWiIvk99pkySGg][Osgiliath][inet[/...:9303]]{data=false, client=true},}, reason: zen-disco-receive(join from node[[Tribemaster/t2][DuNkt5xiQWiIvk99pkySGg][Osgiliath][inet[/...:9303]]{data=false, client=true}])

This is on ES 1.7.1

Regards,
Istvan

Espen_Wang_Andreasse · August 27, 2015, 2:54pm

Hi Istvan - thanks for looking into this.

Does that also happend if you turn off multicast discovery and define the gateways manually by ip?

-espen

Istvan_Papp · August 27, 2015, 3:37pm

That is how I have it set up. On every node multicast is turned off (it wouldn't work on our network anyway). On the tribe node I have the tribes set up like this:

tribe.t1.cluster.name: Palantiri
tribe.t1.discovery.zen.ping.unicast.hosts: ["...:9300","...:9300"]
tribe.t2.cluster.name: Ithil
tribe.t2.discovery.zen.ping.unicast.hosts: ["...:9300"]

The two clusters are located on different continents. I am having a different issue myself, but that is unrelated to this thread.

EDIT1: One thing I forgot to mention: I use IP address in every config file and not DNS names, but that should not influence the behavior of discoveries.

EDIT2: I also did a Wireshark capture and i can confirm that the Tribe node keeps sending TCP keep alives even if it knows the node was unreachable. That is why it can rejoin it as soon as its cluster state goes back to normal.

Espen_Wang_Andreasse · August 27, 2015, 9:13pm

Interesting.
One thing; you are sure you wait for the tribe master to give up initializing before you start the child node?

This is how it logs if I stop both the child-clusters before starting the tribe node.
It stays this way even if starting any of the child-clusters after the last "started".

[2015-08-27 23:10:19,811][INFO ][node                     ] [maeaint02-tribe] initialized
[2015-08-27 23:10:19,811][INFO ][node                     ] [maeaint02-tribe] starting ...
[2015-08-27 23:10:20,061][INFO ][transport                ] [maeaint02-tribe] bound_address {inet[/0:0:0:0:0:0:0:0:9303]}, publish_address {inet[/x.x.x.x:9303]}
[2015-08-27 23:10:20,076][INFO ][discovery                ] [maeaint02-tribe] devbridge-tribe/_VfZiOOBQear7ze5Vzku8w
[2015-08-27 23:10:20,076][WARN ][discovery                ] [maeaint02-tribe] waited for 0s and no initial state was set by the discovery
[2015-08-27 23:10:20,154][INFO ][http                     ] [maeaint02-tribe] bound_address {inet[/0:0:0:0:0:0:0:0:9203]}, publish_address {inet[/x.x.x.x:9203]}
[2015-08-27 23:10:20,154][INFO ][node                     ] [maeaint02-tribe/t2] starting ...
[2015-08-27 23:10:20,310][INFO ][transport                ] [maeaint02-tribe/t2] bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/x.x.x.x:9301]}
[2015-08-27 23:10:20,435][INFO ][discovery                ] [maeaint02-tribe/t2] tribe-test2/Y4yaAZOLThO7GyWmo3Zhsw
[2015-08-27 23:10:50,458][WARN ][discovery                ] [maeaint02-tribe/t2] waited for 30s and no initial state was set by the discovery
[2015-08-27 23:10:50,458][INFO ][node                     ] [maeaint02-tribe/t2] started
[2015-08-27 23:10:50,458][INFO ][node                     ] [maeaint02-tribe/t1] starting ...
[2015-08-27 23:10:50,571][INFO ][transport                ] [maeaint02-tribe/t1] bound_address {inet[/0:0:0:0:0:0:0:0:9302]}, publish_address {inet[/x.x.x.x:9302]}
[2015-08-27 23:10:50,680][INFO ][discovery                ] [maeaint02-tribe/t1] tribe-test1/8vV9LSuzS76Ob54dTS-5kg
[2015-08-27 23:11:20,691][WARN ][discovery                ] [maeaint02-tribe/t1] waited for 30s and no initial state was set by the discovery
[2015-08-27 23:11:20,691][INFO ][node                     ] [maeaint02-tribe/t1] started
[2015-08-27 23:11:20,691][INFO ][node                     ] [maeaint02-tribe] started

Regards,
Espen

Espen_Wang_Andreasse · August 27, 2015, 10:27pm

I actually managed to get it to work.
I had not noticed that the tribe clients themselves bound to local ports as well. Since I was testing with multiple nodes on the same machine, I had situations where my test nodes did not get the expected port number if the tribe node started first. (My initial tests were between datacenters though, so I don't know why I couldn't get it to work then).

So thanks for your commends; they made me try again and making sure the nodes had explicit port bindings this time.

-e