Problems With Local Elasticsearch VM Cluster Communication

zylo47 · October 2, 2015, 5:25pm

I am trying to setup a local VM cluster of 3 Elasticsearch servers. I started with a single, stand-alone server running through VirtualBox with the Centos 6.6 operating system, then I cloned it so I now have 3 VMs, all running Elasticsearch 1.3.2. (the version of Elasticsearch must be 1.3.2 because our production environment is 1.3.2, I am trying to test some changes locally).

The problem I am having is that every time I try to setup the clustering, the servers fail to communicate with each other due to the following error

[2015-10-02 17:13:32,129][WARN ][http.netty ] [NYCVM7531] Caught exception while handling client http traffic, closing connection [id: 0xa47a7885, /10.18.0.184:38136 => /10.18.0.101:9200]
java.lang.IllegalArgumentException: invalid version format: NYCVM7531_1^K10.18.0.184

The servers are not in DNS. They are getting a DHCP IP address. They are called NYCVM7531, NYCVM7531_1, and NYCVM7531_2. The IPs are 10.18.0.101, 10.18.0.184, and 10.18.0.185. Each have entries in their local /etc/hosts file so they can ping each other.

I have named the cluster NYCVM7531_Cluster in each of the elasticsearch.yml files. I have set discovery.zen.ping.multicast.enabled to false on each node. I have also set discovery.zen.ping.unicast.hosts to ["NYCVM7531:9200","NYCVM7531_1:9200","NYCVM7531_2:9200"]

Here is what iptables looks like on all three nodes. I don't think it's a port issue but I am unsure:

Table: filter
Chain INPUT (policy ACCEPT)
num target prot opt source destination
1 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
2 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0
3 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
4 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22
5 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:9200
6 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:9300
7 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT)
num target prot opt source destination
1 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT)
num target prot opt source destination

If I change minimum master nodes from 1 to 2 the cluster fails to start because none of the nodes can talk. When it is set to 1 they only see themselves.

I don't know what the problem could be. Please help.

update I tried shutting the firewall off using sudo service iptables stop on all 3 nodes and then restarting elasticsearch, same issue.

zylo47 · October 2, 2015, 6:34pm

I fixed the problem. I had to remove the port specification out of the unicast.hosts entries

instead of "NYCVM7531:9200" i just put "NYCVM7531" and it started working.

Christian_Dahlqvist · October 2, 2015, 6:36pm

9200 is the default port for HTTP traffic. Internal cluster traffic uses the 9300 port.

zylo47 · October 2, 2015, 6:37pm

So i was basically crossing wires by trying to force it to use 9200 when it would have tried to use 9300?

warkolm · October 3, 2015, 11:42pm

Yep