When using external networking using calico, cluster nodes though can talk to each other on ports 9200 & 9300 but additional port needed for cluster discovery being dynamic hence are not opened by default. Therefore discovery process is unable to find other nodes of the cluster which are running in docker container but on different host.
Is it possible for Elasticsearch to use only specified ports only for inter node cluster communication ?
Example: using calico for external networking of containers:
common.network ] configuration:
cali0
inet 192.168.0.7 netmask:255.255.255.0 broadcast:0.0.0.0 scope:site
UP MULTICAST mtu:1500 index:89
eth1
inet 172.18.0.3 netmask:255.255.0.0 broadcast:0.0.0.0 scope:site
UP MULTICAST mtu:1500 index:91
Logs in cluster start give Null pointer exception:
[2016-05-10 13:55:20,567][DEBUG][common.netty ] using gathering [true]
[2016-05-10 13:55:20,643][DEBUG][discovery.zen.elect ] [es01] using minimum_master_nodes [-1]
[2016-05-10 13:55:20,645][DEBUG][discovery.zen.ping.unicast] [es01] using initial hosts [es01:9300, es02:9300, es03:9300], with concurrent_connects [10]
[2016-05-10 13:55:20,667][DEBUG][discovery.zen ] [es01] using ping.timeout [3s], join.timeout [1m], master_election.filter_client [true], master_election.filter_data [false]
[2016-05-10 13:55:20,671][DEBUG][discovery.zen.fd ] [es01] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
....
[2016-05-10 13:55:22,436][WARN ][transport.netty ] [es01] exception caught on transport layer [[id: 0xeb5ace79, /192.168.0.7:49759 => es02/192.168.0.4:9300]], closing connection
java.lang.NullPointerException
** at**
When using the HOST network for containers: All nodes are discovered:
[2016-05-10 14:03:21,776][DEBUG][common.network ] configuration:
eth0
inet 10.236.133.168 netmask:255.255.252.0 broadcast:10.236.135.255 scope:site
UP MULTICAST mtu:1500 index:2
docker_gwbridge
inet 172.18.0.1 netmask:255.255.0.0 broadcast:0.0.0.0 scope:site
MULTICAST mtu:1500 index:3
docker0
inet 172.17.0.1 netmask:255.255.0.0 broadcast:0.0.0.0 scope:site
MULTICAST mtu:1500 index:4
Cluster start Log:
[2016-05-10 14:03:21,780][DEBUG][common.netty ] using gathering [true]
[2016-05-10 14:03:21,819][DEBUG][discovery.zen.elect ] [mesos-s1] using minimum_master_nodes [-1]
[2016-05-10 14:03:21,821][DEBUG][discovery.zen.ping.unicast] [mesos-s1] using initial hosts [mesos-s1:9300, mesos-s2:9300, mesos-s3:9300], with concurrent_connects [10]
[2016-05-10 14:03:21,830][DEBUG][discovery.zen ] [mesos-s1] using ping.timeout [3s], join.timeout [1m], master_election.filter_client [true], master_election.filter_data [false]
[2016-05-10 14:03:21,832][DEBUG][discovery.zen.fd ] [mesos-s1] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[[2016-05-10 14:03:22,130][DEBUG][script ] [mesos-s1] using script cache with max_size [100], expire [null]
[2016-05-10 14:03:22,138][DEBUG][cluster.routing.allocation.decider] [mesos-s1] using node_concurrent_recoveries [4], node_initial_primaries_recoveries [10]
[2016-05-10 14:03:22,139][DEBUG][cluster.routing.allocation.decider] [mesos-s1] using [cluster.routing.allocation.allow_rebalance] with [indices_all_active]
....
[2016-05-10 14:03:25,805][DEBUG][discovery.zen ] [mesos-s1] filtered ping responses: (filter_client[true], filter_data[false])
--> ping_response{node [{mesos-s3}{Ub03F05zSES5oiXBUlL3BA}{10.236.133.170}{10.236.133.170:9300}{master=true}], id[12], master [null], hasJoinedOnce [false], cluster_name[es_dtest]}
--> ping_response{node [{mesos-s2}{OIVsOQ_RQfi0LgsExnY46Q}{10.236.133.169}{10.236.133.169:9300}{master=true}], id[17], master [{mesos-s2}{OIVsOQ_RQfi0LgsExnY46Q}{10.236.133.169}{10.236.133.169:9300}{master=true}], hasJoinedOnce [true], cluster_name[es_dtest]}
Reason for this is ports other than 9200 and 9300 are needed between cluster nodes and containers only expose ports 9200 and 9300 . This blocks the communication. See ports connection sample in working state:
tcp 0 0 10.236.133.168:9300 10.236.133.170:39200 ESTABLISHED 14736/java
tcp 0 0 10.236.133.168:41268 10.236.133.169:9300 ESTABLISHED 14736/java
tcp 0 0 10.236.133.168:41275 10.236.133.169:9300 ESTABLISHED 14736/java
tcp 0 0 10.236.133.168:36775 10.236.133.87:2379 ESTABLISHED 11283/confd