Which network.host to use to connect nodes running inside docker-containers on separate machines?


#1

Hi,

I have a question on networking when clustering two ElasticSearch nodes which run on separate machines in docker containers.

To enable discovery, I specify 0.0.0.0 as network.host, as I cannot bind to the real IP / hostname directly.

But this seems to lead to the nodes trying to find the other nodes using the container IPs from the container network, e.g. 172.19.0.3. But as they do not run on the same machine, they cannot connect to each other. They do find and identify each other, but they cannot connect (I guess the finding uses the discovery.zen.ping.unicast.hosts, which are FQDM, and the actual connecting then uses some other mechanism?).

So if someone could help me to find the right network.host setting so that the nodes can find each other using real hostnames or IPs, that would be great. Sorry if I'm missing something obvious, I'm not a networking expert at all.

Thanks a lot!

I intend to run node "meerkat" as the master, and node "donkey" as a data node:

docker-compose.yml for meerkat (master):

environment:
  - cluster.name=my-test-cluster
  - node.name=meerkat
  #- network.host=meerkat.xyz.de # Can not bind to this
  - network.host=0.0.0.0
  - discovery.zen.ping.unicast.hosts=donkey.xyz.de
  - discovery.zen.minimum_master_nodes=1
  - node.master=true
  - node.data=true
  - node.ingest=true

docker-compose.yml for donkey (data node):

environment:
  - cluster.name=my-test-cluster
  - node.name=donkey
  #- network.host=donkey.xyz.de
  - network.host=0.0.0.0
  - discovery.zen.ping.unicast.hosts=meerkat.xyz.de
  - discovery.zen.minimum_master_nodes=1
  - node.master=false
  - node.data=true
  - node.ingest=true

meerkat starts up fine. On donkey startup, it can ping meerkat, which replies with its docker-internal IP (172.19.0.3, the same as returned by docker inspect), but then runs into an error as it tries to connect to 172.19.0.3, of course. I want it to connect ideally to the FQDM of the master (meerkat.xyz.de), or to the ip (136.xxx.xxx.xxx).

elasticsearch_1  | [2018-10-19T16:25:39,644][DEBUG][o.e.h.n.Netty4HttpServerTransport] [donkey] Bound http to address {0.0.0.0:9200}
elasticsearch_1  | [2018-10-19T16:25:39,683][INFO ][o.e.h.n.Netty4HttpServerTransport] [donkey] publish_address {172.18.0.2:9200}, bound_addresses {0.0.0.0:9200}
elasticsearch_1  | [2018-10-19T16:25:39,683][INFO ][o.e.n.Node               ] [donkey] started
elasticsearch_1  | [2018-10-19T16:25:42,067][DEBUG][o.e.d.z.ZenDiscovery     ] [donkey] filtered ping responses: (ignore_non_masters [false])
elasticsearch_1  | 	--> ping_response{node [{sdcb2host}{k4dXLy1ZTGGizrkfjo8X-w}{mpT4S0gNTQiolf9YkvlIkw}{172.19.0.3}{172.19.0.3:9300}], id[10], master [{sdcb2host}{k4dXLy1ZTGGizrkfjo8X-w}{mpT4S0gNTQiolf9YkvlIkw}{172.19.0.3}{172.19.0.3:9300}],cluster_state_version [300], cluster_name[my-test-cluster]}
elasticsearch_1  | 	--> ping_response{node [{donkey}{R5k549i1QkuDutGsKp8hCw}{LIB4yeBUQMqqmHNfbMaVSQ}{172.18.0.2}{172.18.0.2:9300}], id[16], master [null],cluster_state_version [-1], cluster_name[my-test-cluster]}
elasticsearch_1  | [2018-10-19T16:25:43,940][DEBUG][o.e.c.s.MasterService    ] [donkey] processing [zen-disco-election-stop [{sdcb2host}{k4dXLy1ZTGGizrkfjo8X-w}{mpT4S0gNTQiolf9YkvlIkw}{172.19.0.3}{172.19.0.3:9300} elected]]: execute
elasticsearch_1  | [2018-10-19T16:25:44,226][DEBUG][o.e.c.s.MasterService    ] [donkey] processing [zen-disco-election-stop [{sdcb2host}{k4dXLy1ZTGGizrkfjo8X-w}{mpT4S0gNTQiolf9YkvlIkw}{172.19.0.3}{172.19.0.3:9300} elected]]: took [284ms] no change in cluster state

elasticsearch_1  | [2018-10-19T16:25:45,039][WARN ][o.e.d.z.ZenDiscovery     ] [donkey] failed to connect to master [{meerkat}{k4dXLy1ZTGGizrkfjo8X-w}{mpT4S0gNTQiolf9YkvlIkw}{172.19.0.3}{172.19.0.3:9300}], retrying...
elasticsearch_1  | org.elasticsearch.transport.ConnectTransportException: [meerkat][172.19.0.3:9300] connect_exception
[....] # removed because of character limitation for forum post...

docker inspect tells me that meerkat's address inside the docker network is

            "Networks": {
                    ...
                    "Gateway": "172.19.0.1",
                    "IPAddress": "172.19.0.3",

docker inspect tells me that donkey's address inside the docker network is

            "Networks": {
		            ...
                    "Gateway": "172.18.0.1",
                    "IPAddress": "172.18.0.2",

I can provide full docker-compose files for reproducing this if you want.


#2

Hi,
any help on this? I still have not succeeded clustering my nodes.


#3

For the forum:

Hi,
I found the solution. I need to separately set the network.bind_host and the network.publish_host:

      - network.bind_host=0.0.0.0
      - network.publish_host=136.xxx.xxx.xxx # or FQDM

PS: For those from the future who stumble over the same problem - I had also tried using some of the special values for network.host, e.g. _global_, which gives this exception:

org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: No up-and-running global-scope (public) addresses found, got [name:lo (lo), name:eth0 (eth0)]