ES 1.7.3 Tribe node is not federating

Hi folks,

I've done a bit of testing to see if there is something obvious I am missing.. if there is... I would appreciate a pointer :slight_smile:

ES 1.7.3 on all nodes.

When I follow this very simplified HOWTO, it works exactly right:

However, when I use my production cluster as the target host(s), I get this:

root@sandbox-tb:/usr/share/elasticsearch# curl localhost:9200/_cat/nodes
sandbox-tb. 10.33.11.106 c x quest

as opposed to this, when I run the example:

root@sandbox-tb:/usr/share/elasticsearch# curl localhost:9202/_cat/nodes
sandbox-tb. 10.33.11.106 c x quest
sandbox-tb. 10.33.11.106 c x quest/t2
sandbox-tb. 10.33.11.106 4 5 0.12 d x drewr2
sandbox-tb. 10.33.11.106 c x quest/t1
sandbox-tb. 10.33.11.106 4 5 0.12 d x drewr1

I tried using the standard methods, configuration file changes, etc.. but I decided to simplify down to this command line for the tribe node:

root@sandbox-tb:/usr/share/elasticsearch# bin/elasticsearch -Des.tribe.t1.cluster.name=es-qqqq -Des.tribe.t1.discovery.zen.ping.unicast.hosts=10.38.10.130:9300 -Des.tribe.t1.discovery.zen.ping.multicast.enabled=false -Des.logger.level=DEBUG

That 10.38.10.130 host is one of our production ES cluster members. I also tried with a comma separated list of all cluster members with the same result.

I can do search and etc remotely from the sandbox host I am testing with. The network connection is sound, and there isn't anything in the way. It is about 60ms away (networkly) from the sandbox.

I am not getting any errors in the logs that I can see now, and since the tribe node is set to DEBUG, same same.

Can someone give me a pointer as to why this might be happening?

EDIT; To add.. when I do this via standard ES methods, each tribe opens up its own transport port. Here are the startup logs:

[2015-12-16 11:48:28,375][INFO ][node                     ] [tribe-1] initializing ...
[2015-12-16 11:48:28,495][INFO ][plugins                  ] [tribe-1] loaded [], sites []
[2015-12-16 11:48:31,564][INFO ][node                     ] [tribe-1/t2] version[1.7.3], pid[1661], build[05d4530/2015-10-15T09:14:17Z]
[2015-12-16 11:48:31,564][INFO ][node                     ] [tribe-1/t2] initializing ...
[2015-12-16 11:48:31,565][INFO ][plugins                  ] [tribe-1/t2] loaded [], sites []
[2015-12-16 11:48:32,543][INFO ][node                     ] [tribe-1/t2] initialized
[2015-12-16 11:48:32,544][INFO ][node                     ] [tribe-1/t1] version[1.7.3], pid[1661], build[05d4530/2015-10-15T09:14:17Z]
[2015-12-16 11:48:32,545][INFO ][node                     ] [tribe-1/t1] initializing ...
[2015-12-16 11:48:32,545][INFO ][plugins                  ] [tribe-1/t1] loaded [], sites []
[2015-12-16 11:48:33,334][INFO ][node                     ] [tribe-1/t1] initialized
[2015-12-16 11:48:33,348][INFO ][node                     ] [tribe-1] initialized
[2015-12-16 11:48:33,348][INFO ][node                     ] [tribe-1] starting ...
[2015-12-16 11:48:33,438][INFO ][transport                ] [tribe-1] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.33.11.106:9300]}
[2015-12-16 11:48:33,448][INFO ][discovery                ] [tribe-1] tribe-foo/MzQOQAc8RyeJ4YYFjDLkfA
[2015-12-16 11:48:33,448][WARN ][discovery                ] [tribe-1] waited for 0s and no initial state was set by the discovery
[2015-12-16 11:48:33,461][INFO ][http                     ] [tribe-1] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.33.11.106:9200]}
[2015-12-16 11:48:33,461][INFO ][node                     ] [tribe-1/t2] starting ...
[2015-12-16 11:48:33,524][INFO ][transport                ] [tribe-1/t2] bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/10.33.11.106:9301]}
[2015-12-16 11:48:33,530][INFO ][discovery                ] [tribe-1/t2] es-spinalq/ZWvwSbF4Rtml5lc3MPG8xQ
[2015-12-16 11:49:03,530][WARN ][discovery                ] [tribe-1/t2] waited for 30s and no initial state was set by the discovery
[2015-12-16 11:49:03,530][INFO ][node                     ] [tribe-1/t2] started
[2015-12-16 11:49:03,531][INFO ][node                     ] [tribe-1/t1] starting ...
[2015-12-16 11:49:03,548][INFO ][transport                ] [tribe-1/t1] bound_address {inet[/0:0:0:0:0:0:0:0:9302]}, publish_address {inet[/10.33.11.106:9302]}
[2015-12-16 11:49:03,550][INFO ][discovery                ] [tribe-1/t1] es-spinalq/S3xBEVgUTbGAXlAjqlb6sA
[2015-12-16 11:49:33,550][WARN ][discovery                ] [tribe-1/t1] waited for 30s and no initial state was set by the discovery
[2015-12-16 11:49:33,551][INFO ][node                     ] [tribe-1/t1] started
[2015-12-16 11:49:33,551][INFO ][node                     ] [tribe-1] started

Merci!

If you look at drewr1 and drewr2 equivalent log files, and don't see any messages about "join request" for adding the tribe node to the existing clusters, it means the tribe node is not communicating with the expected clusters.

I did not get this problem with v1.7.3 but it happened with v2.1.1. I added the following lines to the tribe node's configuration file to get it working

tribe.t1.network.bind_host: 0.0.0.0
tribe.t1.network.publish_host: <tribe node's IP>

and used unicast setup in the clusters (since v2.1.1 only supports unicast out of the box)