To add some more information to this - I suspect it's to do with the fact that EC2 machines have an internal address and an external address. More detail...
I'm having problems getting unicast discovery working between two machines (one is a Linode, one is an EC2 instance). The EC2 ES instance seems to be able to talk to the Linode, but a timeout occurs somewhere preventing it joining the cluster.
I tried swapping the unicast host lists around, so that the EC2 instance is what both tried to connect to. It still didn't work, but I see messages like this in the Linode node's logs:
2013-01-16 13:57:22,189][TRACE][discovery.zen.ping.unicast] [Baroness Blood] [2] sending to [#zen_unicast_1#][inet[/54.247.0.254:9700]]
[2013-01-16 13:57:22,207][TRACE][discovery.zen.ping.unicast] [Baroness Blood] [2] received response from [#zen_unicast_1#][inet[/54.247.0.254:9700]]: [ping_response{target [[Gideon, Gregory][OEaqkM5NR5yK5pgMS9nJ_g][inet[/10.33.160.162:9700]]], master [null], cluster_name[staging-es]}, ping_response{target [[Baroness Blood][SqVsSWx3Qi27GfMgiQINfA][inet[/176.58.126.151:9700]]], master [null], cluster_name[staging-es]}, ping_response{target [[Baroness Blood][SqVsSWx3Qi27GfMgiQINfA][inet[/176.58.126.151:9700]]], master [null], cluster_name[staging-es]}, ping_response{target [[Gideon, Gregory][OEaqkM5NR5yK5pgMS9nJ_g][inet[/10.33.160.162:9700]]], master [null], cluster_name[staging-es]}, ping_response{target [[Baroness Blood][SqVsSWx3Qi27GfMgiQINfA][inet[/176.58.126.151:9700]]], master [null], cluster_name[staging-es]}, ping_response{target [[Gideon, Gregory][OEaqkM5NR5yK5pgMS9nJ_g][inet[/10.33.160.162:9700]]], master [null], cluster_name[staging-es]}, ping_response{target [[Baroness Blood][SqVsSWx3Qi27GfMgiQINfA][inet[/176.58.126.151:9700]]], master [null], cluster_name[staging-es]}, ping_response{target [[Gideon, Gregory][OEaqkM5NR5yK5pgMS9nJ_g][inet[/10.33.160.162:9700]]], master [[Gideon, Gregory][OEaqkM5NR5yK5pgMS9nJ_g][inet[/10.33.160.162:9700]]], cluster_name[staging-es]}]
[2013-01-16 13:57:22,208][TRACE][discovery.zen.ping.unicast] [Baroness Blood] [2] disconnecting from [#zen_unicast_1#][inet[/54.247.0.254:9700]]
[2013-01-16 13:57:22,208][TRACE][discovery.zen ] [Baroness Blood] full ping responses:
--> target [[Gideon, Gregory][OEaqkM5NR5yK5pgMS9nJ_g][inet[/10.33.160.162:9700]]], master [[Gideon, Gregory][OEaqkM5NR5yK5pgMS9nJ_g][inet[/10.33.160.162:9700]]]
To my uneducated eye, it appears that the EC2 ES instance is reporting its internal IP address rather than the externally accessible one. The Linode instance then goes on to log:
2013-01-16 13:57:52,235][WARN ][discovery.zen ] [Baroness Blood] failed to connect to master [[Gideon, Gregory][OEaqkM5NR5yK5pgMS9nJ_g][inet[/10.33.160.162:9700]]], retrying...
org.elasticsearch.transport.ConnectTransportException: [Gideon, Gregory][inet[/10.33.160.162:9700]] connect_timeout[30s]
at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:563)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:505)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:483)
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:128)
at org.elasticsearch.discovery.zen.ZenDiscovery.innterJoinCluster(ZenDiscovery.java:326)
at org.elasticsearch.discovery.zen.ZenDiscovery.access$500(ZenDiscovery.java:75)
at org.elasticsearch.discovery.zen.ZenDiscovery$1.run(ZenDiscovery.java:280)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.net.ConnectException: connection timed out
at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processConnectTimeout(NioClientSocketPipelineSink.java:391)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:289)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more
Clearly, it's trying to connect directly to that internal IP address.
Is this configuration possible, trying to join a machine outside EC2 to a machine inside EC2? I guess I could tunnel the connection over SSH if I had to.
Cheers,
Dan
Dan Fairs | dan.fairs@gmail.com | @danfairs | secondsync.com
--