Connecting Tribe node to EC2 cluster


(Matt Dainty) #1

Hi,

I have a ES cluster built in EC2, using the cloud-aws plugin for
discovery. The nodes are using their private IP addresses for
communication and everything works ok.

I want to run one or more tribe nodes remotely, so I point the tribe
node to the public IP addresses of some of the nodes however this
doesn't work as the network.publish_host setting on each node in the
cluster defaults to pointing at its private address.

I realised that using the public EC2 DNS names within EC2 points to
the private IP addresses of each node still so I reconfigured my cluster
to use discovery.ec2.host_type=public_dns and also set
network.publish_host=ec2:publicDns . The cluster still works as before
with the traffic still using the private IP addresses.

However my tribe node still complains it can't reach the private IP
address of the nodes; I was expecting it to get the public
ec2-X-X-X-X.....amazonaws.com name and resolve it to then gain the
public IP address, which should then hopefully work.

On two nodes in the cluster if I fetch
http://localhost:9200/_nodes/node1,node2/transport?pretty on both node1
and node2 then I notice that the publish_address for the local node is
reported as "inet[ec2-X-X-X-X....amazonaws.com/10.0.0.1:9300]" but the
publish_address for the non-local node is only reported as being
"inet[/10.0.0.2:9300]". Would this mean the the tribe node when
connecting remotely still only gets "inet[/10.0.0.x:9300]" for each node
address?

Am I misunderstanding how this is supposed to work? Is there an
alternative way to attach a remote tribe node to this cluster easily?

Thanks

Matt

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140407160348.GX2245%40simulant.bodgit-n-scarper.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #2

Hey Matt,

I'd like to understand better what is happening here.

Could you gist your elasticsearch.yml files (the ones for elasticsearch standard nodes and the tribe node one)?
Of course, replace your EC2 credentials by dummy values! :slight_smile:

Thanks!

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 7 avril 2014 à 18:03:32, Matt Dainty (matt@bodgit-n-scarper.com) a écrit:

Hi,

I have a ES cluster built in EC2, using the cloud-aws plugin for
discovery. The nodes are using their private IP addresses for
communication and everything works ok.

I want to run one or more tribe nodes remotely, so I point the tribe
node to the public IP addresses of some of the nodes however this
doesn't work as the network.publish_host setting on each node in the
cluster defaults to pointing at its private address.

I realised that using the public EC2 DNS names within EC2 points to
the private IP addresses of each node still so I reconfigured my cluster
to use discovery.ec2.host_type=public_dns and also set
network.publish_host=ec2:publicDns . The cluster still works as before
with the traffic still using the private IP addresses.

However my tribe node still complains it can't reach the private IP
address of the nodes; I was expecting it to get the public
ec2-X-X-X-X.....amazonaws.com name and resolve it to then gain the
public IP address, which should then hopefully work.

On two nodes in the cluster if I fetch
http://localhost:9200/_nodes/node1,node2/transport?pretty on both node1
and node2 then I notice that the publish_address for the local node is
reported as "inet[ec2-X-X-X-X....amazonaws.com/10.0.0.1:9300]" but the
publish_address for the non-local node is only reported as being
"inet[/10.0.0.2:9300]". Would this mean the the tribe node when
connecting remotely still only gets "inet[/10.0.0.x:9300]" for each node
address?

Am I misunderstanding how this is supposed to work? Is there an
alternative way to attach a remote tribe node to this cluster easily?

Thanks

Matt

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140407160348.GX2245%40simulant.bodgit-n-scarper.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.5343aa37.66ef438d.164d%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(Matt Dainty) #3

Hey Matt,

I'd like to understand better what is happening here.

Could you gist your elasticsearch.yml files (the ones for elasticsearch standard nodes and the tribe node one)?
Of course, replace your EC2 credentials by dummy values! :slight_smile:

Sure, they're up at https://gist.github.com/bodgit/10102642

The master nodes in my cluster have Elastic IP addresses assigned so
they're on "well known" IP addresses, and that's what the tribe node
configuration uses as the values for the zen unicast hosts. All other
cluster nodes have normal public EC2 addresses.

I also forgot to mention, I'm running ES 1.1.0 and cloud-aws 2.1.0.

Matt

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140408085838.GY2245%40simulant.bodgit-n-scarper.com.
For more options, visit https://groups.google.com/d/optout.


(Matt Dainty) #4

Hey Matt,

I'd like to understand better what is happening here.

Could you gist your elasticsearch.yml files (the ones for elasticsearch standard nodes and the tribe node one)?
Of course, replace your EC2 credentials by dummy values! :slight_smile:

Sure, they're up at https://gist.github.com/bodgit/10102642

The master nodes in my cluster have Elastic IP addresses assigned so
they're on "well known" IP addresses, and that's what the tribe node
configuration uses as the values for the zen unicast hosts. All other
cluster nodes have normal public EC2 addresses.

I also forgot to mention, I'm running ES 1.1.0 and cloud-aws 2.1.0.

Did you have any more thought on this?

My only workaround would be to bring up a VPN so I can access all the
nodes via their private IP addresses, but this will increase the
complexity and require me to manage the VPN(s).

Matt

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140411090217.GA2245%40simulant.bodgit-n-scarper.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #5

Sorry Matt

I did not get a chance to look at it yet in details.
Just to make sure it's not related to the jetty plugin, could you try remove it from the tribe node?

Is the tribe node sharing the same security group?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 11 avril 2014 à 11:02:23, Matt Dainty (matt@bodgit-n-scarper.com) a écrit:

Hey Matt,

I'd like to understand better what is happening here.

Could you gist your elasticsearch.yml files (the ones for elasticsearch standard nodes and the tribe node one)?
Of course, replace your EC2 credentials by dummy values! :slight_smile:

Sure, they're up at https://gist.github.com/bodgit/10102642

The master nodes in my cluster have Elastic IP addresses assigned so
they're on "well known" IP addresses, and that's what the tribe node
configuration uses as the values for the zen unicast hosts. All other
cluster nodes have normal public EC2 addresses.

I also forgot to mention, I'm running ES 1.1.0 and cloud-aws 2.1.0.

Did you have any more thought on this?

My only workaround would be to bring up a VPN so I can access all the
nodes via their private IP addresses, but this will increase the
complexity and require me to manage the VPN(s).

Matt

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140411090217.GA2245%40simulant.bodgit-n-scarper.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.5347c93c.515f007c.ea5a%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(Matt Dainty) #6

Sorry Matt

I did not get a chance to look at it yet in details.
Just to make sure it's not related to the jetty plugin, could you try remove it from the tribe node?

I just pared back the config on the tribe node to just the minimum, (gist
updated), and it still occurs. I tried increasing the logging and I see
things like:

[2014-04-11 11:59:08,379][DEBUG][transport.netty ] [es-tribe-01/sydney] disconnected from [[#zen_unicast_1#][es-tribe-01][inet[/54.206.x.x:9300]]]
[2014-04-11 11:59:08,381][DEBUG][transport.netty ] [es-tribe-01/sydney] disconnected from [[#zen_unicast_2#][es-tribe-01][inet[/54.206.x.x:9300]]]
[2014-04-11 11:59:08,382][DEBUG][discovery.zen ] [es-tribe-01/sydney] filtered ping responses: (filter_client[true], filter_data[false])
--> target [[es-master-02][4rcZQu2USJuk6sSGu-by0w][es-master-02][inet[/172.31.x.x:9300]]{aws_availability_zone=ap-southeast-2b, data=false, master=true}], master [[es-master-02][4rcZQu2USJuk6sSGu-by0w][es-master-02][inet[/172.31.x.x:9300]]{aws_availability_zone=ap-southeast-2b, data=false, master=true}]
--> target [[es-master-01][whb9bzt_QV2AyiuhIyBtWw][es-master-01][inet[/172.31.x.x:9300]]{aws_availability_zone=ap-southeast-2a, data=false, master=true}], master [[es-master-02][4rcZQu2USJuk6sSGu-by0w][es-master-02][inet[/172.31.x.x:9300]]{aws_availability_zone=ap-southeast-2b, data=false, master=true}]

I would expect the inet[/172.31.x.x:9300]'s to actually be
inet[ec2-54-206-x-x.ap-southeast-2.compute.amazonaws.com/172.31.x.x:9300]
which I see for some of the nodes when I query the cluster using the
/_nodes/transport endpoint.

Every node with HTTP enabled reports its http_adddress with the EC2 DNS
name included, but only the node I make the request to reports its
publish_address with the DNS included, all other nodes just contain
their IP address, e.g.:

curl -s http://elb:9200/_nodes/transport?pretty | grep http_address

  "http_address" : "inet[ec2-54-206-x-x.ap-southeast-2.compute.amazonaws.com/172.31.x.x:9200]",
  "http_address" : "inet[ec2-54-206-x-x.ap-southeast-2.compute.amazonaws.com/172.31.x.x:9200]",

curl -s http://elb:9200/_nodes/transport?pretty | grep publish_address

    "publish_address" : "inet[/172.31.x.x:9300]"
    "publish_address" : "inet[/172.31.x.x:9300]"
    "publish_address" : "inet[/172.31.x.x:9300]"
    "publish_address" : "inet[ec2-54-206-x-x.ap-southeast-2.compute.amazonaws.com/172.31.x.x:9300]"
    "publish_address" : "inet[/172.31.x.x:9300]"
    "publish_address" : "inet[/172.31.x.x:9300]"

(There's six nodes in the cluster but only two have HTTP enabled)

I would expect all publish_addresses to contain the DNS name.

Is the tribe node sharing the same security group?

The tribe node is not actually in EC2 at all, but the security groups
are set so it has full access on ports 9300-9399 to each node, which I
guess is confirmed by the fact its getting as far as the list of masters
with their private IP addresses via the zen unicast hosts.

Matt

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140411115608.GB2245%40simulant.bodgit-n-scarper.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #7

And all your nodes have, right?

network:
publish_host: ec2:publicDns

I think some nodes are only master and some others are only data nodes, right?

Any chance you could put in your gist all individual generated elasticsearch.yml file?

I don't understand why one successfully uses

"publish_address" : "inet[ec2-54-206-x-x.ap-southeast-2.compute.amazonaws.com/172.31.x.x:9300]"

And the other ones:

"publish_address" : "inet[/172.31.x.x:9300]"

With the exact same parameters, it should not happen.
Could you try using ec2:publicIp and see how it goes?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 11 avril 2014 à 13:56:11, Matt Dainty (matt@bodgit-n-scarper.com) a écrit:

Sorry Matt

I did not get a chance to look at it yet in details.
Just to make sure it's not related to the jetty plugin, could you try remove it from the tribe node?

I just pared back the config on the tribe node to just the minimum, (gist
updated), and it still occurs. I tried increasing the logging and I see
things like:

[2014-04-11 11:59:08,379][DEBUG][transport.netty ] [es-tribe-01/sydney] disconnected from [[#zen_unicast_1#][es-tribe-01][inet[/54.206.x.x:9300]]]
[2014-04-11 11:59:08,381][DEBUG][transport.netty ] [es-tribe-01/sydney] disconnected from [[#zen_unicast_2#][es-tribe-01][inet[/54.206.x.x:9300]]]
[2014-04-11 11:59:08,382][DEBUG][discovery.zen ] [es-tribe-01/sydney] filtered ping responses: (filter_client[true], filter_data[false])
--> target [[es-master-02][4rcZQu2USJuk6sSGu-by0w][es-master-02][inet[/172.31.x.x:9300]]{aws_availability_zone=ap-southeast-2b, data=false, master=true}], master [[es-master-02][4rcZQu2USJuk6sSGu-by0w][es-master-02][inet[/172.31.x.x:9300]]{aws_availability_zone=ap-southeast-2b, data=false, master=true}]
--> target [[es-master-01][whb9bzt_QV2AyiuhIyBtWw][es-master-01][inet[/172.31.x.x:9300]]{aws_availability_zone=ap-southeast-2a, data=false, master=true}], master [[es-master-02][4rcZQu2USJuk6sSGu-by0w][es-master-02][inet[/172.31.x.x:9300]]{aws_availability_zone=ap-southeast-2b, data=false, master=true}]

I would expect the inet[/172.31.x.x:9300]'s to actually be
inet[ec2-54-206-x-x.ap-southeast-2.compute.amazonaws.com/172.31.x.x:9300]
which I see for some of the nodes when I query the cluster using the
/_nodes/transport endpoint.

Every node with HTTP enabled reports its http_adddress with the EC2 DNS
name included, but only the node I make the request to reports its
publish_address with the DNS included, all other nodes just contain
their IP address, e.g.:

curl -s http://elb:9200/_nodes/transport?pretty | grep http_address

"http_address" : "inet[ec2-54-206-x-x.ap-southeast-2.compute.amazonaws.com/172.31.x.x:9200]",
"http_address" : "inet[ec2-54-206-x-x.ap-southeast-2.compute.amazonaws.com/172.31.x.x:9200]",

curl -s http://elb:9200/_nodes/transport?pretty | grep publish_address

"publish_address" : "inet[/172.31.x.x:9300]"
"publish_address" : "inet[/172.31.x.x:9300]"
"publish_address" : "inet[/172.31.x.x:9300]"
"publish_address" : "inet[ec2-54-206-x-x.ap-southeast-2.compute.amazonaws.com/172.31.x.x:9300]"
"publish_address" : "inet[/172.31.x.x:9300]"
"publish_address" : "inet[/172.31.x.x:9300]"

(There's six nodes in the cluster but only two have HTTP enabled)

I would expect all publish_addresses to contain the DNS name.

Is the tribe node sharing the same security group?

The tribe node is not actually in EC2 at all, but the security groups
are set so it has full access on ports 9300-9399 to each node, which I
guess is confirmed by the fact its getting as far as the list of masters
with their private IP addresses via the zen unicast hosts.

Matt

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140411115608.GB2245%40simulant.bodgit-n-scarper.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.5347dabd.41a7c4c9.ea5a%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(Matt Dainty) #8

And all your nodes have, right?

network:
publish_host: ec2:publicDns

Yes. I'm using Puppet so every node gets the same configuration.

I think some nodes are only master and some others are only data nodes, right?

Yes. The main difference is the data nodes have EBS volumes attached and
mounted for storing the indices, and they're bigger instances.

Any chance you could put in your gist all individual generated elasticsearch.yml file?

The only difference so far between the configurations is the data nodes
have indices.memory.index_buffer_size set and obviously the values for
node.data and node.master are flipped. All other settings are identical.

I don't understand why one successfully uses

"publish_address" : "inet[ec2-54-206-x-x.ap-southeast-2.compute.amazonaws.com/172.31.x.x:9300]"

And the other ones:

"publish_address" : "inet[/172.31.x.x:9300]"

That's what I'm confused about too.

I understand "inet[ec2-54-206-x-x.amazonaws.com/172.31.x.x:9300]" is
just a textual representation of the transport endpoint, but does that
mean that another node will use the result of the DNS lookup of the
hostname in preference to the IP address that's also reported?

With the exact same parameters, it should not happen.
Could you try using ec2:publicIp and see how it goes?

I tried to use the public IP instead, but I couldn't get the cluster to
successfully associate, I think that's the security groups interfering
so I'll have another attempt.

However, as I understand it, using the public IP means traffic between
cluster nodes performs worse and costs more, (Amazon classes it as
intra-region traffic, as if I was transfering between availability
zones). That's when I discovered that the public DNS name internally
still resolves to the private IP address so I assumed using the public
DNS names everywhere keeps the internal traffic using the private IP
addresses and external traffic (from the tribe nodes) still also work.

I've also just tried using the cloud-aws plugin on the tribe node and
adapted the configuration like so:

tribe:
blocks:
metadata: true
sydney:
cloud:
aws:
access_key: abc123
region: ap-southeast-2
secret_key: secret
cluster:
name: logstash
discovery:
ec2:
groups: elasticsearch
host_type: public_dns
type: ec2
zen:
minimum_master_nodes: 2
ping:
multicast:
enabled: false

I can see it makes a query to find all the nodes in the cluster, with
their public DNS names and IP addresses but then when it connects to
them it starts trying to use the private IP addresses again and I'm in
the same situation as before.

Matt

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140411130118.GC2245%40simulant.bodgit-n-scarper.com.
For more options, visit https://groups.google.com/d/optout.


(Matt Dainty) #9

Could you try using ec2:publicIp and see how it goes?

Ok, I can get the cluster to associate when using ec2:publicIp however
EC2 security groups don't work very well; I have to add the public IP
address of every node in the cluster to the inbound rules of the
security group that they're all in otherwise they can't connect to each
other. Whereas before I just had a rule that said any traffic is allowed
if it's from the same security group.

I'd much rather get it working keeping the cluster traffic using the
private IP addresses.

With the cluster up I tried the tribe node again, now it just logs this
every 20 seconds:

[2014-04-11 15:38:25,497][INFO ][discovery.zen ] [es-tribe-01/sydney] failed to send join request to master [[es-master-02][Vii7h7O1RNy6rpGPRXVDLQ][es-master-02][inet[/54.206.x.x:9300]]{aws_availability_zone=ap-southeast-2b, data=false, master=true}], reason [org.elasticsearch.ElasticsearchTimeoutException: Timeout waiting for task.]

What timeout is that referring to?

I can telnet to the TCP ports on the master node and running tcpdump on
each node I can see traffic going back and forth so it is connected. The
master nodes are not logging anything while this is happening, I just
get:

[2014-04-11 15:59:02,219][DEBUG][org.apache.http.impl.conn.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS

once a minute.

I've tried setting discovery.zen.ping_timeout to 10s but it doesn't help.
All the other timeouts I can find seem to default to 30s which is more
than enough.

Matt

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140411150129.GD2245%40simulant.bodgit-n-scarper.com.
For more options, visit https://groups.google.com/d/optout.


(Matt Dainty) #10

With the cluster up I tried the tribe node again, now it just logs this
every 20 seconds:

[2014-04-11 15:38:25,497][INFO ][discovery.zen ] [es-tribe-01/sydney] failed to send join request to master [[es-master-02][Vii7h7O1RNy6rpGPRXVDLQ][es-master-02][inet[/54.206.x.x:9300]]{aws_availability_zone=ap-southeast-2b, data=false, master=true}], reason [org.elasticsearch.ElasticsearchTimeoutException: Timeout waiting for task.]

I ran out of time to poke at this over the weekend so I left it running,
however I've resumed looking at it and now it's also logging this:

[2014-04-14 10:21:12,706][INFO ][discovery.zen ] [es-tribe-01/sydney] failed to send join request to master [[es-master-02][Vii7h7O1RNy6rpGPRXVDLQ][es-master-02][inet[/54.206.x.x:9300]]{aws_availability_zone=ap-southeast-2b, data=false, master=true}], reason [org.elasticsearch.transport.RemoteTransportException: [es-master-02][inet[/172.31.x.x:9300]][discovery/zen/join]; java.lang.OutOfMemoryError: unable to create new native thread]

So leaving the tribe node attempting to join the cluster has caused it
to exhaust its heap somehow. Anything I can grab from this to see what's
happened?

Matt

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140414092423.GA15419%40simulant.bodgit-n-scarper.com.
For more options, visit https://groups.google.com/d/optout.


(system) #11