Transport Client can't connect to AWS EC2 Cluster

cheruvian · July 14, 2016, 12:44am

I have an ES 2.3.2 cluster configured and running in AWS EC2 (VPC). I've opened up both the REST and Transport ports in the security group. I want to be able to connect a TransportClient to the remote cluster running AWS EC2 but it can never seem to connect

tl;dr; To cluster properly publish_host has to be the EC2 Internal Ip so the hosts can cluster within the VPC; External to VPC the internal ip address is unreachable; but the TransportClient seems to only connect if the addedTransportAddress matches the publish_host.

Ports are correct and open with connection to host from external

I'm able to curl against both ports the rest port returns as expected

{
  "name" : "NODE_NAME",
  "cluster_name" : "CLUSTER_NAME",
  "version" : {
    "number" : "2.3.2",
    "build_hash" : "b9e4a6acad4008027e4038f6abed7f7dba346f94",
    "build_timestamp" : "2016-04-21T16:03:47Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.0"
  },
  "tagline" : "You Know, for Search"
}

When I curl against the Transport port it is connecting to the cluster but obviously does not serve HTTP traffic and returns the following message:

 This is not a HTTP port

Yet whenever I attempt to initialize a TransportClient using the following configuration it has no available nodes:

Settings.settingsBuilder()
        .put(ELASTICSEARCH_CLIENT_TRANSPORT_SNIFF_KEY, false)
        .put(ELASTICSEARCH_CLIENT_TRANSPORT_IGNORE_CLUSTER_NAME_KEY, true)
        .put(ELASTICSEARCH_CLIENT_TRANSPORT_PING_TIMEOUT_KEY, "30s")
        .put(ELASTICSEARCH_CLIENT_TRANSPORT_NODES_SAMPLER_INTERVAL_KEY, "30s")
        .build()

...

transportClient.addTransportAddress(EC2_INSTANCE_PUBLIC_IP)

I am using the EC2 Discovery mechanism
Pertinent config section(s)

transport.tcp.port: 8193
transport.tcp.compress: true
http.compression: true 
http.cors.enabled: true
http.port: 8192

discovery.type: ec2
discovery.ec2.tag.ElasticSearch: DeviceProfileLookup

network.host: ["_site_"]
network.bind_host: 0.0.0.0

I've tried setting network.host to ["_ec2:publicIp_", "_ec2:privateIp_"] which then prevents the cluster from clustering on startup.

It sounds like the TransportClient is only able to connect to the cluster if the address used is the same as the publish_host. When I tried setting the publish_host to _ec2:publicIp_, the TransportClient was able to connect, but then then the hosts that live in EC2 are unable to connect to each other.

Any insight or advice would be much appreciated.

Thanks.

dadoonet · July 14, 2016, 1:51am

Which error do you get with this?

dadoonet · July 14, 2016, 1:54am

If you set the network.host to public Ip you need to set discovery.ec2.host_type to public_ip

cheruvian · July 14, 2016, 9:06pm

When I use the following config:

plugin.mandatory: cloud-aws
discovery.type: ec2
discovery.ec2.host_type: public_ip
network.publish_host: ["_ec2:publicIp_"]
network.bind_host: 0.0.0.0

Looking at the logs the data node doesn't even find the master node to try to connect to (at least i don't see any Timeouts) and when I query for health it reports it has no known master.

Log Snippet

    [2016-07-14 20:53:51,744][WARN ][rest.suppressed          ] /_cluster/health Params: {}
    MasterNotDiscoveredException[null]
            at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:226)
            at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:236)
            at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:804)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
    [2016-07-14 20:53:56,773][DEBUG][action.admin.cluster.health] [DATA_NODE_NAME] no known master node, scheduling a retry
    [2016-07-14 20:53:56,773][DEBUG][action.admin.cluster.health] [DATA_NODE_NAME] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])

When I use the following config:

plugin.mandatory: cloud-aws
discovery.type: ec2
discovery.ec2.host_type: private_ip
network.publish_host: ["_site_", "_ec2:publicIp_", "_ec2:privateIp_"]
network.bind_host: 0.0.0.0

The data node discovers the master but requests to it time out.

Log Snippet

    [2016-07-14 20:55:07,233][WARN ][discovery.ec2            ] [DATA_NODE_NAME] failed to connect to master [{MASTER_NODE_NAME}{oPGzf5lbS2GaJDPZBJl3lw}{MASTER_PUBLIC_IP}{MASTER_PUBLIC_IP:PORT_NUMBER}{availability_zone=us-east-1b, data=false, master=true}], retrying...
    ConnectTransportException[[MASTER_NODE_NAME][MASTER_PUBLIC_IP:PORT_NUMBER] connect_timeout[30s]]; nested: ConnectTimeoutException[connection timed out: /MASTER_PUBLIC_IP:PORT_NUMBER];
            at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:987)
            at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:920)
            at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:893)
            at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:260)
            at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:434)
            at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:386)
            at org.elasticsearch.discovery.zen.ZenDiscovery.access$4800(ZenDiscovery.java:91)
            at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1237)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
    Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: /MASTER_PUBLIC_IP:PORT_NUMBER
            at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:139)
            at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
            at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
            at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
            at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
            at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
            ... 3 more
    [2016-07-14 20:55:09,554][DEBUG][action.admin.cluster.health] [DATA_NODE_NAME] no known master node, scheduling a retry
    [2016-07-14 20:55:10,862][DEBUG][action.admin.cluster.health] [DATA_NODE_NAME] no known master node, scheduling a retry

dadoonet · August 10, 2016, 3:46pm

I don't know if you solve it (and sorry for the delay).

I wonder if you could try defining discovery.ec2.groups as well.
Also, define the cloud.aws.region.

Let me know if it fix your issue.

Also could you check that you can actually telnet from one machine to the other using the public IP address on port 9300? If not, you need to check all firewall settings and security groups.

Topic		Replies	Views
How can I create a transport client via vpn? Elasticsearch	1	968	June 30, 2017
Client.transport.sniff when connecting localhost to an EC2 node Elasticsearch	1	612	July 6, 2017
Transport Client unable to resolve hostname in certain cases Elasticsearch	12	3573	July 6, 2017
Deploying on EC2: cannot connect remotely to port 9300 Elasticsearch	3	8202	July 5, 2017
Overriding `tcp.publish_port` breaks clustering when elasticsearch is in a container Elasticsearch	4	1340	July 5, 2017

Transport Client can't connect to AWS EC2 Cluster

Related topics