Zen Discovery issues in an AWS VPC

Hi guys,

My VP, Inder, came to your meetup. We are having an issue deploying ES and suggested I bring the question to this forum.

We successfully create a test environment with a master node in EC2, and Node Clients in a java application on Elasticbeanstalk, all in one AWS VPC. The node clients have the following as configuration:

Settings settings = ImmutableSettings.settingsBuilder().put("http.enabled", false)
.put("cluster.name", clusterName).put("node.master", false)
.put("node.data", true)
.put("discovery.zen.ping.multicast.enabled", "false")
.put("discovery.zen.ping.unicast.hosts", nodeAddress).build();
Node node = NodeBuilder.nodeBuilder().clusterName(clusterName)
.loadConfigSettings(false).settings(settings).node();
client = node.client();

nodeAddress is the address of the EC2 instance master node. The master also happens to contain in the configuration:

discovery.zen.ping.unicast.hosts = ["myEnv.elasticbeanstalk.com"]

This setup worked quite well in the test environment, and as new instances of the node clients were created they connected with ease.

We then moved this setup to our production environment, creating a new VPC. But the application will not connect to the master. We have confirmed that we can ping 9200 9300 etc, and from the logs it looks like the discovery is working. But when it is time to copy over the cluster state, the request times out.

Nov 20 00:57:25 DEBUG [org.elasticsearch.transport.netty] - [Iceman] connected to node [[Iceman][tcx8GaOCSIa1xYGayU0xVg][ip-172-30-0-244][inet[/172.30.0.244:9300]]{master=false}]
Nov 20 00:57:45 WARN [org.elasticsearch.discovery] - [Iceman] waited for 30s and no initial state was set by the discovery
Nov 20 00:57:45 DEBUG [org.elasticsearch.gateway] - [Iceman] can't wait on start for (possibly) reading state from gateway, will do it asynchronously
Nov 20 00:57:45 INFO [org.elasticsearch.node] - [Iceman] started
Nov 20 00:57:49 INFO [org.elasticsearch.discovery.zen] - [Iceman] failed to send join request to master [[ip-172-30-0-103][sRQH8J0wQa2ynz1WdNq5_g][ip-172-30-0-103][inet[/172.30.0.103:9300]]{master=true}], reason [RemoteTransportException[[ip-172-30-0-103][inet[/172.30.0.103:9300]][internal:discovery/zen/join]]; nested: ConnectTransportException[[Iceman][inet[/172.30.0.244:9300]] connect_timeout[30s]]; nested: ConnectTimeoutException[connection timed out: /172.30.0.244:9300]; ]
Nov 20 00:58:23 INFO [org.elasticsearch.discovery.zen] - [Iceman] failed to send join request to master [[ip-172-30-0-103][sRQH8J0wQa2ynz1WdNq5_g][ip-172-30-0-103][inet[/172.30.0.103:9300]]{master=true}], reason [RemoteTransportException[[ip-172-30-0-103][inet[/172.30.0.103:9300]][internal:discovery/zen/join]]; nested: ConnectTransportException[[Iceman][inet[/172.30.0.244:9300]] connect_timeout[30s]]; nested: ConnectTimeoutException[connection timed out: /172.30.0.244:9300]; ]

What little I have found googling for this indicates this might be an issue with multicast being enabled by default, but I only have one master and one node in the VPC and I have this problem. Also, why does it work in the other environment.

If someone can enlighten me on what direction my troubleshooting should go, that would be very helpful!

I have answer my own question. Luckily it turned out to be a simple firewall issue.

In short:
The transport client is happy if ports 9200 and 9300 are open on the master. But for the node client to work I need to open up those ports on the node client machine as well.

Once I did that, node discovery works perfectly.

1 Like