Cluster network configuration with 2 network boards

Hello, I have 3 hosts with 2 network interfaces. one public and the other private.

I want to bind port 9200 on public network interface (queries) and port 9300 on private interface (zen.discovery)

Here is my conf ifconfig -a for each of my servers (the IPs differ of course accross the hosts).

  • IPs starting with 163.XXX.XXX.XXX are the public ones (interface enp1s0f0)
  • IPs starting with 10.XXX.XXX.XXX are the private ones (interface enp1s0f1)
enp1s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 163.aaa.aaa.27  netmask 255.255.255.0  broadcast 163.aaa.aaa.255
        inet6 fe80::ec4:7aff:fe83:1228  prefixlen 64  scopeid 0x20<link>
        ether 0c:c4:7a:83:12:28  txqueuelen 1000  (Ethernet)
        RX packets 1078210  bytes 827903485 (789.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 279800  bytes 182892689 (174.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x7ac00000-7ac7ffff

enp1s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet 10.bbb.bbb.27  netmask 255.255.255.128  broadcast 10.bbb.bbb.127
        inet6 fe80::ec4:7aff:fe83:1229  prefixlen 64  scopeid 0x20<link>
        ether 0c:c4:7a:83:12:29  txqueuelen 1000  (Ethernet)
        RX packets 3033  bytes 234078 (228.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2220  bytes 165435 (161.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x7ac80000-7acfffff

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1  (Local Loopback)
        RX packets 1122961  bytes 200701185 (191.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1122961  bytes 200701185 (191.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

I have those settings in my elasticsearch.yml file. I took the inet private address of each of my hosts for the zen.discovery array.

discovery.zen.ping.unicast.hosts: ["10.bbb.bbb.27", "10.bbb.bbb.40", "10.bbb.bbb.75"]
network.bind_host: 0.0.0.0
network.publish_host: _enp1s0f1_

Unfortunately, it does not work, the nodes are responding on both of my adresses (public and private) on port 9200. But the nodes never joins the cluster, that means, I have 3 clusters of 1 node.

by the way, the ping is OK:

root@wilco-1:~# ping 10.bbb.bbb.40
PING 10.bbb.bbb.40 (10.bbb.bbb.40) 56(84) bytes of data.
64 bytes from 10.bbb.bbb.40: icmp_seq=1 ttl=63 time=0.468 ms
64 bytes from 10.bbb.bbb.40: icmp_seq=2 ttl=63 time=0.620 ms
64 bytes from 10.bbb.bbb.40: icmp_seq=3 ttl=63 time=0.623 ms
64 bytes from 10.bbb.bbb.40: icmp_seq=4 ttl=63 time=0.623 ms

What did I miss? HEEEELP !

The network.bind_host setting is for the address that Elasticsearch binds to for HTTP and TCP (and 0.0.0.0 means all addresses). If you want to use separate addresses for these, you have to use different settings. In this case, you want http.host and transport.host (note that you can set http.bind_host and http.publish_host as well as transport.bind_host and transport.publish_host but based on your description I don't think that you need that).

So, you would want:

http.host: _enp1s0f0_
transport.host: _enp1s0f1_

I have set this and it seems to work

network.host: 0.0.0.0
network.publish_host: _enp1s0f1_

The issue was a ip-filtering issue, where in fact, I had to add the cluster hosts in it (ip-filtering is a x-pack feature).

OK, so now, the hosts are talking together but in the monitoring, I still can see only 1 node. And the funny thing is that it is the same node whatever the host !

so situation:

host-1 (https://wilco-1.domain.fr/) runs node-1 (node.name: wilco-1)
host-2 (https://wilco-2.domain.fr/) runs node-2 (node.name: wilco-2)
host-3 (https://wilco-3.domain.fr/) runs node-3 (node.name: wilco-3)

I have plenty of logs like
[2018-04-18T15:02:06,149][WARN ][o.e.g.DanglingIndicesState] [wilco-3] [[.monitoring-es-6-2018.04.17/clVarL1iQ2OnOaT4fcFwPA]] can not be imported as a dangling index, as index with same name already exists in cluster metadata

here is the monitoring rendering:

and I found 2 clusters with the same name!!!

How is it possible that there are 2 clusters with the same name?

After a few hours, the 2nd cluster has disapeared and everything back OK

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.