Client.transport.sniff when connecting localhost to an EC2 node


(Jon Riegel) #1

I'm inexperienced with ES and EC2 so forgive me if this is a novice
question.

I'm attempting to run a local java/tomcat application, using the
TransportClient to connect to ES running on EC2. I have the following
configuration on the server (note: some items not included for the sake of
brevity):

discovery:
type: ec2
ec2:
tag:

host_type: public_dns

network:
publish_host: ec2:publicDns

On the client side:

host=ec2-23-XX-XX-XX.compute-1.amazonaws.com
port=9300
cloud.aws.access_key=
cloud.aws.secret_key=
cloud.aws.region=us-east-1b
discovery.type=ec2
discovery.ec2.tag.stuff=
discovery.ec2.host_type=public_dns

So my initial connection is to the public domain, and I have the ports open
to
My initial connection appears to succeed, and the client also works fine
when I set client.transport.sniff to false. I have set the log level to
TRACE for org.elasticsearch.client.transport, so I can see this in my logs
on the initial connection:

2012-10-08 08:38:21 DEBUG main org.elasticsearch.client.transport -
[Gatecrasher] adding address
[[#transport#-1][inet[ec2-23-XX-XX-XX.compute-1.amazonaws.com/23.XX.XX.XX:9300]]]

From here we can see that, since localhost is outside of EC2, that domain
name resolves to the external EC2 ip, 23.XX.XX.XX, and the client is able
to connect.

When I set client.transport.sniff to false, my client connects, the above
is all I see in the logs, and searching/indexing works, so I don't believe
I have any issues with firewalls, open ports or security groups.

However, if I set sniff to true, the client "discovers" the node it has
already connected to - except it attempts to reconnect using the internal
IP, 10.YY.YY.YY:

2012-10-08 08:38:26 DEBUG main org.elasticsearch.client.transport -
[Gatecrasher] failed to connect to discovered node
[[Kleinstocks][yyexVITpQ-W8GPnl77LkJg][inet[/10.YY.YY.YY:9300]]]
org.elasticsearch.transport.ConnectTransportException:
[Kleinstocks][inet[/10.YY.YY.YY:9300]] connect_timeout[5s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:612)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:554)

My theory is that even though the server is publishing on public_dns, it
resolves the domain locally (to the internal ip) and then keeps the result,
and then when nodes sniff it out, they attempt to connect to that cached
result rather than resolving the domain.

I have played with various settings trying to get around this - for
example, if set network.publish_host to ec2:publicIp and restart, my
local client connects, rediscovers the node and connects again, but I see
this in the server logs, presumably because the server can't resolve the
external IP:

[2012-10-08 15:04:44,338][WARN ][cluster.service ] [Abraham
Cornelius] failed to reconnect to node [Abraham
Cornelius][zTWY25A5RzS31vLHzBak8A][inet[/23.XX.XX.XX:9300]]
org.elasticsearch.transport.ConnectTransportException: [Abraham
Cornelius][inet[/23.XX.XX.XX:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:640)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:579)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:549)

I've also messed with ec2.discovery.host_type and network.tcp.reuse_address
on the server and client sides, and network.publish_host on the client side
to no effect.

So the questions:
-Is my theory correct or am I missing something?
-Is there a way to configure my client or server such that during the
sniff/discovery process, internal nodes use the internal IP and external
nodes (i.e. transport clients on webservers) use the external IP?
-What exactly is the difference between the settings "network.publish_host"
and "ec2.discovery.host_type"?

I can work around this for now by disabling client.transport.sniff, but I'm
posting the question because I feel like I'm missing some critical aspect
of configuration that I will need down the line, and this feels like
something that ought to work. Thanks for your attention.

--


(system) #2