Problem with node client in AWS


(Simpsora) #1

I've got an ES cluster running in AWS, using the elasticsearch-cloud-awsplugin for discovery. The 5 nodes in the cluster can talk to each other
just fine. I currently have a search application which connects to the
cluster through an ELB via a TransportClient, and everything works fine. I
am investigating using a Node client instead (to avoid the double-hop and
ELB overhead), but am having trouble getting the node client to connect.

When I try to connect to the cluster with a node client, it can't connect,
and shows the following log messages:

Nov 14, 2013 2:16:31 PM org.elasticsearch.node INFO: [Surge]
version[0.90.6], pid[16397], build[e2a24ef/2013-11-04T13:54:09Z]
Nov 14, 2013 2:16:31 PM org.elasticsearch.node INFO: [Surge] initializing
...
Nov 14, 2013 2:16:31 PM org.elasticsearch.plugins INFO: [Surge] loaded
[cloud-aws], sites []
Nov 14, 2013 2:16:36 PM org.elasticsearch.node INFO: [Surge] initialized
Nov 14, 2013 2:16:36 PM org.elasticsearch.node INFO: [Surge] starting ...
Nov 14, 2013 2:16:36 PM org.elasticsearch.transport INFO: [Surge]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/172.23.4.113:9300]}
Nov 14, 2013 2:17:06 PM org.elasticsearch.discovery WARNING: [Surge] waited
for 30s and no initial state was set by the discovery
Nov 14, 2013 2:17:06 PM org.elasticsearch.discovery INFO: [Surge]
es-prod-spike/irs60NgmRQKkTY0SyEMHXQ
Nov 14, 2013 2:17:06 PM org.elasticsearch.http INFO: [Surge] bound_address
{inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.23.4.113:9200]}
Nov 14, 2013 2:17:06 PM org.elasticsearch.node INFO: [Surge] started
Nov 14, 2013 2:18:38 PM org.elasticsearch.discovery.ec2 INFO: [Surge]
failed to send join request to master
[[es-ip-172-23-10-252][CQdGd4CASFOtW1dLPyfpZQ][inet[/172.23.10.252:9300]]{availability_zone=ap-southeast-2b}],
reason [org.elasticsearch.transport.RemoteTransportException:
[es-ip-172-23-10-252][inet[/172.23.10.252:9300]][discovery/zen/join];
org.elasticsearch.transport.ConnectTransportException:
[Surge][inet[/172.23.4.113:9300]] connect_timeout[30s];
org.elasticsearch.common.netty.channel.ConnectTimeoutException: connection
timed out: /172.23.4.113:9300]

Nov 14, 2013 2:20:39 PM org.elasticsearch.discovery.ec2 INFO: [Surge]
failed to send join request to master
[[es-ip-172-23-10-252][CQdGd4CASFOtW1dLPyfpZQ][inet[/172.23.10.252:9300]]{availability_zone=ap-southeast-2b}],
reason [org.elasticsearch.transport.RemoteTransportException:
[es-ip-172-23-10-252][inet[/172.23.10.252:9300]][discovery/zen/join];
org.elasticsearch.transport.ConnectTransportException:
[Surge][inet[/172.23.4.113:9300]] connect_timeout[30s];
org.elasticsearch.common.netty.channel.ConnectTimeoutException: connection
timed out: /172.23.4.113:9300]

Interestingly, it was able to determine the master node (es-ip-172-23-10-252),
so some communication between the nodes is working.

I have confirmed that the machine I'm trying to connect from (also in the
same region [and possibly AZ]) can connect to both ports (9200 and 9300) on
each node in the cluster:

$ nc -z 172.23.10.252 9200
Connection to 172.23.10.252 9200 port [tcp/wap-wsp] succeeded!
$ nc -z 172.23.10.252 9300
Connection to 172.23.10.252 9300 port [tcp/vrace] succeeded!

...

The cluster is using the following discovery settings:

cluster.name: es-prod-spike
cluster.routing.allocation.awareness.attributes: availability_zone
cloud.aws.region: ap-southeast-2
discovery.type: ec2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping_timeout: 15s
discovery.zen.minimum_master_nodes: 3
discovery.ec2.tag.Name: ElasticSearch-es-spike-stack
node.availability_zone: ap-southeast-2a

The node client is using the following discovery settings:

elasticSearchSettings.put( "cluster.name", "es-prod-spike" );
elasticSearchSettings.put(
"cluster.routing.allocation.awareness.attributes", "availability_zone" );
elasticSearchSettings.put( "cloud.aws.region", "ap-southeast-2" );
elasticSearchSettings.put( "discovery.type", "ec2" );
elasticSearchSettings.put( "discovery.zen.ping.multicast.enabled", false );

elasticSearchSettings.put( "discovery.zen.ping_timeout", "60s" );

elasticSearchSettings.put( "discovery.zen.minimum_master_nodes", 3 );
elasticSearchSettings.put( "discovery.ec2.tag.Name",
"ElasticSearch-es-spike-stack" );

When the nodes participating in the cluster first come up, they log plenty
of details about the discovery (since I have discovery set to trace level
on the nodes). I don't see any such logging in the node client, so it's
hard to debug any further. I'm not sure how to set the log levels for the
node client running in my search app -- it's a java app which already uses
log4j, and adding what seem to be the correct settings to my existing
log4j.properties file doesn't increase the log level in the client. I
don't see any messages in any of the cluster nodes' logs either.

Environment:

OS: CentOS 6
ElasticSearch: 0.90.6
AWS plugin: 1.15.0
JDK: Oracle Java 1.7.0_45-b18, 64-bit

Any suggestions as to why the node client can't connect? The security
groups allow ingress & egress between the nodes in question.

Thanks!
Ross

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2