Elasticsearch with multihost docker networking - Cluster connectivity issues

ajay_bh111 · May 10, 2016, 2:39pm

When using external networking using calico, cluster nodes though can talk to each other on ports 9200 & 9300 but additional port needed for cluster discovery being dynamic hence are not opened by default. Therefore discovery process is unable to find other nodes of the cluster which are running in docker container but on different host.

Is it possible for Elasticsearch to use only specified ports only for inter node cluster communication ?

Example: using calico for external networking of containers:
common.network ] configuration:
cali0
inet 192.168.0.7 netmask:255.255.255.0 broadcast:0.0.0.0 scope:site
UP MULTICAST mtu:1500 index:89

eth1
inet 172.18.0.3 netmask:255.255.0.0 broadcast:0.0.0.0 scope:site
UP MULTICAST mtu:1500 index:91

Logs in cluster start give Null pointer exception:
[2016-05-10 13:55:20,567][DEBUG][common.netty ] using gathering [true]
[2016-05-10 13:55:20,643][DEBUG][discovery.zen.elect ] [es01] using minimum_master_nodes [-1]
[2016-05-10 13:55:20,645][DEBUG][discovery.zen.ping.unicast] [es01] using initial hosts [es01:9300, es02:9300, es03:9300], with concurrent_connects [10]
[2016-05-10 13:55:20,667][DEBUG][discovery.zen ] [es01] using ping.timeout [3s], join.timeout [1m], master_election.filter_client [true], master_election.filter_data [false]
[2016-05-10 13:55:20,671][DEBUG][discovery.zen.fd ] [es01] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
....

[2016-05-10 13:55:22,436][WARN ][transport.netty ] [es01] exception caught on transport layer [[id: 0xeb5ace79, /192.168.0.7:49759 => es02/192.168.0.4:9300]], closing connection
java.lang.NullPointerException
** at**

When using the HOST network for containers: All nodes are discovered:

[2016-05-10 14:03:21,776][DEBUG][common.network ] configuration:

eth0
inet 10.236.133.168 netmask:255.255.252.0 broadcast:10.236.135.255 scope:site
UP MULTICAST mtu:1500 index:2

docker_gwbridge
inet 172.18.0.1 netmask:255.255.0.0 broadcast:0.0.0.0 scope:site
MULTICAST mtu:1500 index:3

docker0
inet 172.17.0.1 netmask:255.255.0.0 broadcast:0.0.0.0 scope:site
MULTICAST mtu:1500 index:4

Cluster start Log:
[2016-05-10 14:03:21,780][DEBUG][common.netty ] using gathering [true]
[2016-05-10 14:03:21,819][DEBUG][discovery.zen.elect ] [mesos-s1] using minimum_master_nodes [-1]
[2016-05-10 14:03:21,821][DEBUG][discovery.zen.ping.unicast] [mesos-s1] using initial hosts [mesos-s1:9300, mesos-s2:9300, mesos-s3:9300], with concurrent_connects [10]
[2016-05-10 14:03:21,830][DEBUG][discovery.zen ] [mesos-s1] using ping.timeout [3s], join.timeout [1m], master_election.filter_client [true], master_election.filter_data [false]
[2016-05-10 14:03:21,832][DEBUG][discovery.zen.fd ] [mesos-s1] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[[2016-05-10 14:03:22,130][DEBUG][script ] [mesos-s1] using script cache with max_size [100], expire [null]
[2016-05-10 14:03:22,138][DEBUG][cluster.routing.allocation.decider] [mesos-s1] using node_concurrent_recoveries [4], node_initial_primaries_recoveries [10]
[2016-05-10 14:03:22,139][DEBUG][cluster.routing.allocation.decider] [mesos-s1] using [cluster.routing.allocation.allow_rebalance] with [indices_all_active]
....
[2016-05-10 14:03:25,805][DEBUG][discovery.zen ] [mesos-s1] filtered ping responses: (filter_client[true], filter_data[false])
--> ping_response{node [{mesos-s3}{Ub03F05zSES5oiXBUlL3BA}{10.236.133.170}{10.236.133.170:9300}{master=true}], id[12], master [null], hasJoinedOnce [false], cluster_name[es_dtest]}
--> ping_response{node [{mesos-s2}{OIVsOQ_RQfi0LgsExnY46Q}{10.236.133.169}{10.236.133.169:9300}{master=true}], id[17], master [{mesos-s2}{OIVsOQ_RQfi0LgsExnY46Q}{10.236.133.169}{10.236.133.169:9300}{master=true}], hasJoinedOnce [true], cluster_name[es_dtest]}

Reason for this is ports other than 9200 and 9300 are needed between cluster nodes and containers only expose ports 9200 and 9300 . This blocks the communication. See ports connection sample in working state:
tcp 0 0 10.236.133.168:9300 10.236.133.170:39200 ESTABLISHED 14736/java
tcp 0 0 10.236.133.168:41268 10.236.133.169:9300 ESTABLISHED 14736/java
tcp 0 0 10.236.133.168:41275 10.236.133.169:9300 ESTABLISHED 14736/java
tcp 0 0 10.236.133.168:36775 10.236.133.87:2379 ESTABLISHED 11283/confd

warkolm · May 11, 2016, 11:28pm

By default ES will use 9300-9399 for node comms. It should usually only use 9300, but may pick another one in the range if that is taken.

ajay_bh111 · May 13, 2016, 5:53pm

My concern is how can I force ES the ports in bold to be 9300 instead of random ports. See below:

tcp 0 0 10.236.133.168:41268 10.236.133.169:9300 ESTABLISHED 14736/java
tcp 0 0 10.236.133.168:41275 10.236.133.169:9300 ESTABLISHED 14736/java

warkolm · May 15, 2016, 6:41am

Those are the originating ports and will always be random. You can see the second IP:PORT combo has 9300, that is the destination for that connection and shows that ES is listening on 9300.