Elasticsearch EC2 setup across multiple regions


(Derry O' Sullivan) #1

Hi all,

I'm trying to setup an elasticsearch cluster with 2 x ec2 small instances.
One is based in the EU and the other in the US east region.

I setup the 2 instances seperately with the same cluster name and verified
they worked ok (just a request to :9200.

I've opened up ports 22(ssh), 9200(http) and the range (9300 - 9400) on my
security groups.

I can't seem to get the 2 machines to communicate properly, both find each
other and try to communicate but seem to elect no master.. - i've changed
the logging.yml to add more debug info (based on hunting through existing
group posts).

The relevant settings on both machines is:
EU machine:
cluster.name: elasticsearch-demo-HS
cloud.aws.access_key:
cloud.aws.secret_key:
discovery.type: ec2
discovery.zen.ping.timeout: 5m

Note, no cloud.aws.region....

US machine:
cluster.name: elasticsearch-demo-HS
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: eu-west ####suggested in other forums to allow us machine
to find eu machine (
https://groups.google.com/forum/#!searchin/elasticsearch/aws/elasticsearch/gB2ag71gFT8/n7-Cpyg5O4cJ
)
discovery.type: ec2
discovery.zen.ping.timeout: 5m

When i start up the EU machine, it does setup and gets information back
from amazon about the instances via the cloud aws plugin
[2012-09-27 14:25:13,391][TRACE][discovery.zen.ping.unicast] [Legacy] [1]
connecting (light) to [#cloud-i-b7eb27ca-0][inet[/hidden:9300]]

It then times out on connection (as the other node in the US has not
started)
[2012-09-27 14:25:43,451][TRACE][discovery.zen.ping.unicast] [Legacy] [1]
failed to connect to [#cloud-i-b7eb27ca-0][inet[/hidden:9300]]
org.elasticsearch.transport.ConnectTransportException:
[][inet[/10.96.67.186:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:585)

I then start the US node:
and get:
[2012-09-27 14:30:42,262][TRACE][discovery.zen.ping.unicast] [Seth] [1]
connecting (light) to [#cloud-i-cade1081-0][inet[/hidden:9300]]

On the EU machine, i then get a:
[2012-09-27 14:47:18,913][DEBUG][transport.netty ] [Adaptoid]
connected to node [[#cloud-i-b7eb27ca-0][inet[hidden:9300]]]
[2012-09-27 14:47:18,914][TRACE][discovery.zen.ping.unicast] [Adaptoid] [1]
connected to [#cloud-i-b7eb27ca-0][inet[hidden:9300]]
[2012-09-27 14:47:18,914][TRACE][discovery.zen.ping.unicast] [Adaptoid] [1]
sending to [#cloud-i-b7eb27ca-0][inet[hidden:9300]]
[2012-09-27 14:47:19,042][TRACE][discovery.zen.ping.unicast] [Adaptoid] [1]
received response from [#cloud-i-b7eb27ca-0][inet[hidden:9300]]:
[ping_response{target
[[Adaptoid][O1XIpLrPTE-yS1kh4OEfTw][inet[/10.240.50.61:9300]]], master
[null], cluster_name[elasticsearch-demo-HS]}, ping_response{target
[[Eliminator][RodL0FfpStSsl0shyZa8Vw][inet[/10.96.67.186:9300]]], master
[null], cluster_name[elasticsearch-demo-HS]}]

When i query the _cluster_stats, i get:
{

  • error: "ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state
    not recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];]",
  • status: 503

}

An initial problem (not looking up across internal IP addresses) was solved
by changing the discovery ec2 type from private_ip to public_ip
(communication across regions). Now it's just stumped by the fact that i
can't get the 2 to talk to each other..

Any help greatly appreciated :wink:

Derry

--


(Derry O' Sullivan) #2

OK, so the problem seems to be that i can find the other machine but i
can't connect to it as it is resolving on in the internal ip (e.g. 10.*).
Is their a configuration setting that will explicity force it to connect on
external ip addresses or do i solve that by forcing external lookup in my
hosts file?

Just wondering if anyone else had this problem when resolving over multiple
regions?

Thank

On Thursday, 27 September 2012 15:52:08 UTC+1, Derry O' Sullivan wrote:

Hi all,

I'm trying to setup an elasticsearch cluster with 2 x ec2 small instances.
One is based in the EU and the other in the US east region.

I setup the 2 instances seperately with the same cluster name and verified
they worked ok (just a request to :9200.

I've opened up ports 22(ssh), 9200(http) and the range (9300 - 9400) on my
security groups.

I can't seem to get the 2 machines to communicate properly, both find each
other and try to communicate but seem to elect no master.. - i've changed
the logging.yml to add more debug info (based on hunting through existing
group posts).

The relevant settings on both machines is:
EU machine:
cluster.name: elasticsearch-demo-HS
cloud.aws.access_key:
cloud.aws.secret_key:
discovery.type: ec2
discovery.zen.ping.timeout: 5m

Note, no cloud.aws.region....

US machine:
cluster.name: elasticsearch-demo-HS
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: eu-west ####suggested in other forums to allow us
machine to find eu machine (
https://groups.google.com/forum/#!searchin/elasticsearch/aws/elasticsearch/gB2ag71gFT8/n7-Cpyg5O4cJ
)
discovery.type: ec2
discovery.zen.ping.timeout: 5m

When i start up the EU machine, it does setup and gets information back
from amazon about the instances via the cloud aws plugin
[2012-09-27 14:25:13,391][TRACE][discovery.zen.ping.unicast] [Legacy] [1]
connecting (light) to [#cloud-i-b7eb27ca-0][inet[/hidden:9300]]

It then times out on connection (as the other node in the US has not
started)
[2012-09-27 14:25:43,451][TRACE][discovery.zen.ping.unicast] [Legacy] [1]
failed to connect to [#cloud-i-b7eb27ca-0][inet[/hidden:9300]]
org.elasticsearch.transport.ConnectTransportException: [][inet[/
10.96.67.186:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:585)

I then start the US node:
and get:
[2012-09-27 14:30:42,262][TRACE][discovery.zen.ping.unicast] [Seth] [1]
connecting (light) to [#cloud-i-cade1081-0][inet[/hidden:9300]]

On the EU machine, i then get a:
[2012-09-27 14:47:18,913][DEBUG][transport.netty ] [Adaptoid]
connected to node [[#cloud-i-b7eb27ca-0][inet[hidden:9300]]]
[2012-09-27 14:47:18,914][TRACE][discovery.zen.ping.unicast] [Adaptoid]
[1] connected to [#cloud-i-b7eb27ca-0][inet[hidden:9300]]
[2012-09-27 14:47:18,914][TRACE][discovery.zen.ping.unicast] [Adaptoid]
[1] sending to [#cloud-i-b7eb27ca-0][inet[hidden:9300]]
[2012-09-27 14:47:19,042][TRACE][discovery.zen.ping.unicast] [Adaptoid]
[1] received response from [#cloud-i-b7eb27ca-0][inet[hidden:9300]]:
[ping_response{target
[[Adaptoid][O1XIpLrPTE-yS1kh4OEfTw][inet[/10.240.50.61:9300]]], master
[null], cluster_name[elasticsearch-demo-HS]}, ping_response{target
[[Eliminator][RodL0FfpStSsl0shyZa8Vw][inet[/10.96.67.186:9300]]], master
[null], cluster_name[elasticsearch-demo-HS]}]

When i query the _cluster_stats, i get:
{

  • error: "ClusterBlockException[blocked by:
    [SERVICE_UNAVAILABLE/1/state not recovered /
    initialized];[SERVICE_UNAVAILABLE/2/no master];]",
  • status: 503

}

An initial problem (not looking up across internal IP addresses) was
solved by changing the discovery ec2 type from private_ip to public_ip
(communication across regions). Now it's just stumped by the fact that i
can't get the 2 to talk to each other..

Any help greatly appreciated :wink:

Derry

--


(Drew Raines) #3

Derry O' Sullivan wrote:

OK, so the problem seems to be that i can find the other machine
but i can't connect to it as it is resolving on in the internal ip
(e.g. 10.*). Is their a configuration setting that will explicity
force it to connect on external ip addresses or do i solve that by
forcing external lookup in my hosts file?

You mentioned this in your other message:

An initial problem (not looking up across internal IP addresses)
was solved by changing the discovery ec2 type from private_ip to
public_ip (communication across regions). Now it's just stumped by
the fact that i can't get the 2 to talk to each other..

Did this setting actually not work? Does it look like this?

discovery.ec2.host_type: public_ip

Also, you've upped the zen discovery ping timeout:

discovery.zen.ping.timeout: 5m

That's not going to have any effect with discovery.type set to ec2.
Increase the ec2 discovery timeout instead:

discovery.ec2.ping_timeout: 5m

Also note that you may have latency-related issues running a cluster
across regions, but I'm really interested to hear how it works for
you.

-Drew

--


(Derry O' Sullivan) #4

Hi Drew,

Thanks for the response.

On 28 September 2012 15:30, Drew Raines aaraines@gmail.com wrote:

Derry O' Sullivan wrote:

OK, so the problem seems to be that i can find the other machine
but i can't connect to it as it is resolving on in the internal ip
(e.g. 10.*). Is their a configuration setting that will explicity
force it to connect on external ip addresses or do i solve that by
forcing external lookup in my hosts file?

You mentioned this in your other message:

An initial problem (not looking up across internal IP addresses)
was solved by changing the discovery ec2 type from private_ip to
public_ip (communication across regions). Now it's just stumped by
the fact that i can't get the 2 to talk to each other..

Did this setting actually not work? Does it look like this?

discovery.ec2.host_type: public_ip

On further inspection, the host_type did not seem to make a difference. The
machines are able to find each other without a problem (whether it is
public/private IP/DNS), the issue occurs when they try and connect on port
9300/whatever. The communication seems to be based on internal IP addresses
(e.g. 10.X.X.X) meaning that clusters set up easily within the same sub
domain (e.g. region/availability zone) but it does not work externally as
10.X.. cannot connect to 10.Y.. unless you manage the domains yourself
using VPC/routing.

My real question on this was whether someone had used the plugin to do
cross-region clustering before. If it's a setting i have messed up, it'd be
nice to figure it out.

Also, you've upped the zen discovery ping timeout:

discovery.zen.ping.timeout: 5m

That's not going to have any effect with discovery.type set to ec2.
Increase the ec2 discovery timeout instead:

discovery.ec2.ping_timeout: 5m

Thanks for that - i think i copied in that setting from when i was testing
out some other means of connection

Also note that you may have latency-related issues running a cluster
across regions, but I'm really interested to hear how it works for
you.

Completely understand that i will have a dependency on the AWS inter region
connections and external factors but i would just like to test it first.
I'm going to try using standard zen unicast lookup across explicit hosts
and see if that works next.

-Drew

--

--


(Drew Raines) #5

Derry O' Sullivan wrote:

On further inspection, the host_type did not seem to make a
difference. The machines are able to find each other without a
problem (whether it is public/private IP/DNS), the issue occurs
when they try and connect on port 9300/whatever. The communication
seems to be based on internal IP addresses (e.g. 10.X.X.X) meaning
that clusters set up easily within the same sub domain
(e.g. region/availability zone) but it does not work externally as
10.X.. cannot connect to 10.Y.. unless you manage the domains
yourself using VPC/routing.

My real question on this was whether someone had used the plugin to
do cross-region clustering before. If it's a setting i have messed
up, it'd be nice to figure it out.

The plugin defaults to private_ip so it could just be a config issue
that it's not getting flipped to public_ip. Can you post your full
config somewhere for a sanity check?

-Drew

--


(Derry O' Sullivan) #6

Hi drew,

Thanks for the response.

The config of my first server (eu):
cluster.name: esdemohs
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: us-east-1 # not sure if i have to tell
discovery.type: ec2
discovery.ec2.host_type: public_dns
discovery.ec2.ping.timeout: 5m
discovery.ec2.tag.cluster: esdemohs

config of my 2nd server (US):
cluster.name: esdemohs
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: eu-west-1
discovery.type: ec2
discovery.ec2.ping.timeout: 5m
discovery.ec2.host_type: public_dns
discovery.ec2.tag.cluster: esdemohs

I start up both servers, i get a status 200 for the EU server and a status
503 for the US server (web searching shows that that is because it is not
able to join the cluster.

Looking at the trace output on the servers, i see (EU server):
[2012-10-01 10:27:41,244][TRACE][discovery.ec2 ] [Demiurge]
building dynamic unicast discovery nodes...
[2012-10-01 10:27:41,249][TRACE][discovery.ec2 ] [Demiurge]
adding i-b7eb27ca, address ec2-107-22-26-95.compute-1.amazonaws.com,
transport_address inet[
ec2-107-22-26-95.compute-1.amazonaws.com/107.22.26.95:9300]
[2012-10-01 10:27:41,250][DEBUG][discovery.ec2 ] [Demiurge]
using dynamic discovery nodes [[#cloud-i-b7eb27ca-0][inet[
ec2-107-22-26-95.compute-1.amazonaws.com/107.22.26.95:9300]]]
[2012-10-01 10:27:41,250][TRACE][discovery.zen.ping.unicast] [Demiurge] [1]
sending to [#cloud-i-b7eb27ca-0][inet[
ec2-107-22-26-95.compute-1.amazonaws.com/107.22.26.95:9300]]
[2012-10-01 10:27:41,343][TRACE][discovery.zen.ping.unicast] [Demiurge] [1]
received response from [#cloud-i-b7eb27ca-0][inet[
ec2-107-22-26-95.compute-1.amazonaws.com/107.22.26.95:9300]]:
[ping_response{target
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]], master
[null], cluster_name[esdemohs]}, ping_response{target
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]], master
[null], cluster_name[esdemohs]}, ping_response{target
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]], master
[null], cluster_name[esdemohs]}, ping_response{target [[Magnum
I][Xn8IL89ESJyij3H6DYjKhw][inet[/10.214.221.68:9300]]], master [null],
cluster_name[esdemohs]}]
[2012-10-01 10:27:41,348][TRACE][discovery.ec2 ] [Demiurge] full
ping responses:
--> target [[Magnum
I][Xn8IL89ESJyij3H6DYjKhw][inet[/10.214.221.68:9300]]], master [null]
[2012-10-01 10:27:41,348][DEBUG][discovery.ec2 ] [Demiurge]
filtered ping responses: (filter_client[true], filter_data[false])
--> target [[Magnum
I][Xn8IL89ESJyij3H6DYjKhw][inet[/10.214.221.68:9300]]], master [null]
[2012-10-01 10:27:41,350][TRACE][discovery.zen.ping.unicast] [Demiurge] [1]
disconnecting from [#cloud-i-b7eb27ca-0][inet[
ec2-107-22-26-95.compute-1.amazonaws.com/107.22.26.95:9300]]
[2012-10-01 10:27:41,354][DEBUG][transport.netty ] [Demiurge]
disconnected from [[#cloud-i-b7eb27ca-0][inet[
ec2-107-22-26-95.compute-1.amazonaws.com/107.22.26.95:9300]]]
[2012-10-01 10:27:41,355][DEBUG][cluster.service ] [Demiurge]
processing [zen-disco-join (elected_as_master)]: execute
[2012-10-01 10:27:41,356][TRACE][cluster.service ] [Demiurge]
cluster state updated:
version [1], source [zen-disco-join (elected_as_master)]
nodes:
[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]], local,
master
routing_table:
routing_nodes:
---- unassigned

On the US server, i get:
[2012-10-01 10:29:02,398][TRACE][discovery.ec2 ] [Magnum I]
building dynamic unicast discovery nodes...
[2012-10-01 10:29:02,400][TRACE][discovery.ec2 ] [Magnum I]
adding i-cade1081, address
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com, transport_address inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]
[2012-10-01 10:29:02,400][DEBUG][discovery.ec2 ] [Magnum I]
using dynamic discovery nodes [[#cloud-i-cade1081-0][inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]]]
[2012-10-01 10:29:02,400][TRACE][discovery.zen.ping.unicast] [Magnum I]
[13] sending to [#cloud-i-cade1081-0][inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]]
[2012-10-01 10:29:02,498][TRACE][discovery.zen.ping.unicast] [Magnum I]
[13] received response from [#cloud-i-cade1081-0][inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]]:
[ping_response{target [[Magnum
I][Xn8IL89ESJyij3H6DYjKhw][inet[/10.214.221.68:9300]]], master [null],
cluster_name[esdemohs]}, ping_response{target [[Magnum
I][Xn8IL89ESJyij3H6DYjKhw][inet[/10.214.221.68:9300]]], master [null],
cluster_name[esdemohs]}, ping_response{target [[Magnum
I][Xn8IL89ESJyij3H6DYjKhw][inet[/10.214.221.68:9300]]], master [null],
cluster_name[esdemohs]}, ping_response{target
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]], master
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]],
cluster_name[esdemohs]}]
[2012-10-01 10:29:02,498][TRACE][discovery.ec2 ] [Magnum I] full
ping responses:
--> target
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]], master
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]]
[2012-10-01 10:29:02,499][DEBUG][discovery.ec2 ] [Magnum I]
filtered ping responses: (filter_client[true], filter_data[false])
--> target
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]], master
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]]
[2012-10-01 10:29:02,501][TRACE][discovery.zen.ping.unicast] [Magnum I]
[13] disconnecting from [#cloud-i-cade1081-0][inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]]
[2012-10-01 10:29:02,501][DEBUG][transport.netty ] [Magnum I]
disconnected from [[#cloud-i-cade1081-0][inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]]]

Going to: _cluster/health?pretty=true on the EU server gives the server
itself:
{

  • cluster_name: "esdemohs",
  • status: "green",
  • timed_out: false,
  • number_of_nodes: 1,
  • number_of_data_nodes: 1,
  • active_primary_shards: 0,
  • active_shards: 0,
  • relocating_shards: 0,
  • initializing_shards: 0,
  • unassigned_shards: 0

}

on the us server, you get:

{

  • error: "MasterNotDiscoveredException[waited for [30s]]",
  • status: 503

}

It seems that the config or cloud.aws.region or tags does not affect the
outcome, it just seems like they connect and then disconnect again.

Thanks,

Derry
On 28 September 2012 18:27, Drew Raines aaraines@gmail.com wrote:

Derry O' Sullivan wrote:

On further inspection, the host_type did not seem to make a
difference. The machines are able to find each other without a
problem (whether it is public/private IP/DNS), the issue occurs
when they try and connect on port 9300/whatever. The communication
seems to be based on internal IP addresses (e.g. 10.X.X.X) meaning
that clusters set up easily within the same sub domain
(e.g. region/availability zone) but it does not work externally as
10.X.. cannot connect to 10.Y.. unless you manage the domains
yourself using VPC/routing.

My real question on this was whether someone had used the plugin to
do cross-region clustering before. If it's a setting i have messed
up, it'd be nice to figure it out.

The plugin defaults to private_ip so it could just be a config issue
that it's not getting flipped to public_ip. Can you post your full
config somewhere for a sanity check?

-Drew

--

--


(Derry O' Sullivan) #7

I've also changed the host_type to be public_ip vs public_dns and no
difference. I'm able to telnet from one server to the other on port 9200 &
9300 so i think the issue is master election?

Derry

On Thursday, 27 September 2012 15:52:08 UTC+1, Derry O' Sullivan wrote:

Hi all,

I'm trying to setup an elasticsearch cluster with 2 x ec2 small instances.
One is based in the EU and the other in the US east region.

I setup the 2 instances seperately with the same cluster name and verified
they worked ok (just a request to :9200.

I've opened up ports 22(ssh), 9200(http) and the range (9300 - 9400) on my
security groups.

I can't seem to get the 2 machines to communicate properly, both find each
other and try to communicate but seem to elect no master.. - i've changed
the logging.yml to add more debug info (based on hunting through existing
group posts).

The relevant settings on both machines is:
EU machine:
cluster.name: elasticsearch-demo-HS
cloud.aws.access_key:
cloud.aws.secret_key:
discovery.type: ec2
discovery.zen.ping.timeout: 5m

Note, no cloud.aws.region....

US machine:
cluster.name: elasticsearch-demo-HS
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: eu-west ####suggested in other forums to allow us
machine to find eu machine (
https://groups.google.com/forum/#!searchin/elasticsearch/aws/elasticsearch/gB2ag71gFT8/n7-Cpyg5O4cJ
)
discovery.type: ec2
discovery.zen.ping.timeout: 5m

When i start up the EU machine, it does setup and gets information back
from amazon about the instances via the cloud aws plugin
[2012-09-27 14:25:13,391][TRACE][discovery.zen.ping.unicast] [Legacy] [1]
connecting (light) to [#cloud-i-b7eb27ca-0][inet[/hidden:9300]]

It then times out on connection (as the other node in the US has not
started)
[2012-09-27 14:25:43,451][TRACE][discovery.zen.ping.unicast] [Legacy] [1]
failed to connect to [#cloud-i-b7eb27ca-0][inet[/hidden:9300]]
org.elasticsearch.transport.ConnectTransportException: [][inet[/
10.96.67.186:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:585)

I then start the US node:
and get:
[2012-09-27 14:30:42,262][TRACE][discovery.zen.ping.unicast] [Seth] [1]
connecting (light) to [#cloud-i-cade1081-0][inet[/hidden:9300]]

On the EU machine, i then get a:
[2012-09-27 14:47:18,913][DEBUG][transport.netty ] [Adaptoid]
connected to node [[#cloud-i-b7eb27ca-0][inet[hidden:9300]]]
[2012-09-27 14:47:18,914][TRACE][discovery.zen.ping.unicast] [Adaptoid]
[1] connected to [#cloud-i-b7eb27ca-0][inet[hidden:9300]]
[2012-09-27 14:47:18,914][TRACE][discovery.zen.ping.unicast] [Adaptoid]
[1] sending to [#cloud-i-b7eb27ca-0][inet[hidden:9300]]
[2012-09-27 14:47:19,042][TRACE][discovery.zen.ping.unicast] [Adaptoid]
[1] received response from [#cloud-i-b7eb27ca-0][inet[hidden:9300]]:
[ping_response{target
[[Adaptoid][O1XIpLrPTE-yS1kh4OEfTw][inet[/10.240.50.61:9300]]], master
[null], cluster_name[elasticsearch-demo-HS]}, ping_response{target
[[Eliminator][RodL0FfpStSsl0shyZa8Vw][inet[/10.96.67.186:9300]]], master
[null], cluster_name[elasticsearch-demo-HS]}]

When i query the _cluster_stats, i get:
{

  • error: "ClusterBlockException[blocked by:
    [SERVICE_UNAVAILABLE/1/state not recovered /
    initialized];[SERVICE_UNAVAILABLE/2/no master];]",
  • status: 503

}

An initial problem (not looking up across internal IP addresses) was
solved by changing the discovery ec2 type from private_ip to public_ip
(communication across regions). Now it's just stumped by the fact that i
can't get the 2 to talk to each other..

Any help greatly appreciated :wink:

Derry

--


(Drew Raines) #8

Derry O' Sullivan wrote:

The config of my first server (eu):
cluster.name: esdemohs
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: us-east-1 # not sure if i have to tell
discovery.type: ec2
discovery.ec2.host_type: public_dns
discovery.ec2.ping.timeout: 5m
discovery.ec2.tag.cluster: esdemohs

config of my 2nd server (US):
cluster.name: esdemohs
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: eu-west-1
discovery.type: ec2
discovery.ec2.ping.timeout: 5m
discovery.ec2.host_type: public_dns
discovery.ec2.tag.cluster: esdemohs

I start up both servers, i get a status 200 for the EU server and a status
503 for the US server (web searching shows that that is because it is not
able to join the cluster.

[...]

It seems that the config or cloud.aws.region or tags does not affect the
outcome, it just seems like they connect and then disconnect again.

No obvious issues with your config. Although it's interesting that
the us-east node didn't elect itself master also (maybe you have
master disabled on that one).

Feels more like a security group issue. Let's try turning on TRACE
logging for the us-east node. In logging.yml uncomment the line that
looks like

#discovery: TRACE

and restart the node. That will give us some output about what it's
trying to access. Look for lines like:

using host_type...
filtering out instance...

-Drew

--


(Derry O' Sullivan) #9

HI Drew,

Thanks again for the response. This is my entire config - everything else
is the default (commented out).

I didn't include the filtered out instances lines (we have a lot of other
instances) but the output clearly shows
a) The nodes making requests to EC2 to find the other nodes
b) The nodes finding the other server in the other zone(e.g. EU finding us
and US finding EU e.g:
[2012-10-01 10:27:41,249][TRACE][discovery.ec2 ] [Demiurge]
adding i-b7eb27ca, address ec2-107-22-26-95.compute-1.amazonaws.com,
transport_address inet[
ec2-107-22-26-95.compute-1.amazonaws.com/107.22.26.95:9300]
[2012-10-01 10:27:41,250][DEBUG][discovery.ec2 ] [Demiurge]
using dynamic discovery nodes [[#cloud-i-b7eb27ca-0][inet[
ec2-107-22-26-95.compute-1.amazonaws.com/107.22.26.95:9300]http://ec2-107-22-26-95.compute-1.amazonaws.com/107.22.26.95:9300]
]]

[2012-10-01 10:29:02,400][TRACE][discovery.ec2 ] [Magnum I]
adding i-cade1081, address
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com, transport_address inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]
[2012-10-01 10:29:02,400][DEBUG][discovery.ec2 ] [Magnum I]
using dynamic discovery nodes [[#cloud-i-cade1081-0][inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]http://ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]
]]

c) both nodes specifically connecting to their tag/cluster matching node in
the other zone, but instantly disconnect:
[2012-10-01 10:29:02,498][TRACE][discovery.zen.ping.unicast] [Magnum I]
[13] received response from [#cloud-i-cade1081-0][inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]http://ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]]:
[ping_response{target [[Magnum
I][Xn8IL89ESJyij3H6DYjKhw][inet[/10.214.221.68:9300]]], master [null],
cluster_name[esdemohs]}, ping_response{target [[Magnum
I][Xn8IL89ESJyij3H6DYjKhw][inet[/10.214.221.68:9300]]], master [null],
cluster_name[esdemohs]}, ping_response{target [[Magnum
I][Xn8IL89ESJyij3H6DYjKhw][inet[/10.214.221.68:9300]]], master [null],
cluster_name[esdemohs]}, ping_response{target
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]], master
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]],
cluster_name[esdemohs]}]
[2012-10-01 10:29:02,498][TRACE][discovery.ec2 ] [Magnum I] full
ping responses:
--> target
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]], master
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]]
[2012-10-01 10:29:02,499][DEBUG][discovery.ec2 ] [Magnum I]
filtered ping responses: (filter_client[true], filter_data[false])
--> target
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]], master
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]]
[2012-10-01 10:29:02,501][TRACE][discovery.zen.ping.unicast] [Magnum I]
[13] disconnecting from [#cloud-i-cade1081-0][inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]]
[2012-10-01 10:29:02,501][DEBUG][transport.netty ] [Magnum I]
disconnected from [[#cloud-i-cade1081-0][inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]http://ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]
]]

It is saying filter_client true - would that mean that it thinks the other
instance is a client?

I don't have access to machine now but i'll try and get the full logs up on
pastebin or something tomorrow.

Thanks,

D

On 1 October 2012 18:54, Drew Raines aaraines@gmail.com wrote:

Derry O' Sullivan wrote:

The config of my first server (eu):
cluster.name: esdemohs
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: us-east-1 # not sure if i have to tell
discovery.type: ec2
discovery.ec2.host_type: public_dns
discovery.ec2.ping.timeout: 5m
discovery.ec2.tag.cluster: esdemohs

config of my 2nd server (US):
cluster.name: esdemohs
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: eu-west-1
discovery.type: ec2
discovery.ec2.ping.timeout: 5m
discovery.ec2.host_type: public_dns
discovery.ec2.tag.cluster: esdemohs

I start up both servers, i get a status 200 for the EU server and a
status
503 for the US server (web searching shows that that is because it is not
able to join the cluster.

[...]

It seems that the config or cloud.aws.region or tags does not affect the
outcome, it just seems like they connect and then disconnect again.

No obvious issues with your config. Although it's interesting that
the us-east node didn't elect itself master also (maybe you have
master disabled on that one).

Feels more like a security group issue. Let's try turning on TRACE
logging for the us-east node. In logging.yml uncomment the line that
looks like

#discovery: TRACE

and restart the node. That will give us some output about what it's
trying to access. Look for lines like:

using host_type...
filtering out instance...

-Drew

--

--


(Derry O' Sullivan) #10

Hi Drew,

I tested again today. I stopped both nodes and started the first machine
(EU) which just had a config of:
cluster.name: esdemohs

Nothing else, just the cluster itself. My logic was that i wanted to start
up this node, put some data in it, verify it was all ok and then get
another machine (US node) to join:

I went to the root url and got a 200, also /_cluster/nodes shows that it
was ok (one machine in its own cluster):
{

  • ok: true,
  • cluster_name: "esdemohs",
  • nodes:
    {
    • NEv9k_O4Q0SoUUKWXn0CXg:
      {
      • name: "Helio",
      • transport_address: "inet[/10.239.74.227:9300]",
      • hostname: "ip-10-239-74-227",
      • http_address: "inet[/10.239.74.227:9200]"
        }
        }

}

I then put in data, verified that it was all searchable and all was ok
(e.g. just testing that the single instance was ok).

I then went to the US machine and configured it like you said:
cluster.name: esdemohs
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: eu-west
discovery.type: ec2
discovery.ec2.host_type: public_ip
discovery.ec2.tag.cluster: esdemohs
discovery.ec2.ping_timeout: 5m
discovery.zen.ping_timeout: 5m
discovery.ec2.groups: sg-08895c7f # also tried putting this in an array and
using the US machines security group as well

In the output, i get the cryptic line:
[2012-10-02 11:18:48,306][TRACE][discovery.ec2 ] [Hildegarde]
filtering out instance i-cade1081 based on groups [{GroupName: HS-ES-AWS,
GroupId: sg-08895c7f, }], not part of [sg-08895c7f]

Seems to be telling me that i can't join the security group that i'm
explicitly telling it to be a part of (i expected the other way around!) :wink:

If i remove the security groups, it finds the correct node (EU machine),
recieves a response and then disconnects:
[2012-10-02 11:11:06,554][TRACE][discovery.ec2 ] [Gauntlet]
building dynamic unicast discovery nodes...
[2012-10-02 11:11:06,555][TRACE][discovery.ec2 ] [Gauntlet]
filtering out instance i-9f8fcfd7 based tags {cluster=esdemohs}, not part
of [{Key: Name, Value: HS-Dublin, }]
[2012-10-02 11:11:06,555][TRACE][discovery.ec2 ] [Gauntlet]
filtering out instance i-fbdf8bb3 based tags {cluster=esdemohs}, not part
of [{Key: Name, Value: Dublin-2Ports, }]
[2012-10-02 11:11:06,671][TRACE][discovery.ec2 ] [Gauntlet]
adding i-cade1081, address ec2-46-137-44-113.eu-west-1.compute.amazonaws.com,
transport_address inet[
ec2-46-137-44-113.eu-west-1.compute.amazonaws.com/46.137.44.113:9300]
[2012-10-02 11:11:06,674][DEBUG][discovery.ec2 ] [Gauntlet]
using dynamic discovery nodes [[#cloud-i-cade1081-0][inet[
ec2-46-137-44-113.eu-west-1.compute.amazonaws.com/46.137.44.113:9300]]]
[2012-10-02 11:11:06,678][TRACE][discovery.zen.ping.unicast] [Gauntlet] [1]
connecting (light) to [#cloud-i-cade1081-0][inet[
ec2-46-137-44-113.eu-west-1.compute.amazonaws.com/46.137.44.113:9300]]
[2012-10-02 11:11:06,868][DEBUG][transport.netty ] [Gauntlet]
connected to node [[#cloud-i-cade1081-0][inet[
ec2-46-137-44-113.eu-west-1.compute.amazonaws.com/46.137.44.113:9300]]]
[2012-10-02 11:11:06,869][TRACE][discovery.zen.ping.unicast] [Gauntlet] [1]
connected to [#cloud-i-cade1081-0][inet[
ec2-46-137-44-113.eu-west-1.compute.amazonaws.com/46.137.44.113:9300]]
[2012-10-02 11:11:06,869][TRACE][discovery.zen.ping.unicast] [Gauntlet] [1]
sending to [#cloud-i-cade1081-0][inet[
ec2-46-137-44-113.eu-west-1.compute.amazonaws.com/46.137.44.113:9300]]
[2012-10-02 11:11:06,987][TRACE][discovery.zen.ping.unicast] [Gauntlet] [1]
received response from [#cloud-i-cade1081-0][inet[
ec2-46-137-44-113.eu-west-1.compute.amazonaws.com/46.137.44.113:9300]]:
[ping_response{target
[[Gauntlet][iuj9R7MOTrK_Rsq_Vc7V7w][inet[/10.214.27.146:9300]]], master
[null], cluster_name[esdemohs]}, ping_response{target
[[Helio][NEv9k_O4Q0SoUUKWXn0CXg][inet[/10.239.74.227:9300]]], master
[[Helio][NEv9k_O4Q0SoUUKWXn0CXg][inet[/10.239.74.227:9300]]],
cluster_name[esdemohs]}]
[2012-10-02 11:11:34,559][WARN ][discovery ] [Gauntlet]
waited for 30s and no initial state was set by the discovery
[2012-10-02 11:11:34,560][INFO ][discovery ] [Gauntlet]
esdemohs/iuj9R7MOTrK_Rsq_Vc7V7w
[2012-10-02 11:11:34,560][DEBUG][gateway ] [Gauntlet]
can't wait on start for (possibly) reading state from gateway, will do it
asynchronously
[2012-10-02 11:11:34,569][INFO ][http ] [Gauntlet]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
10.214.27.146:9200]}
[2012-10-02 11:11:34,570][INFO ][node ] [Gauntlet]
{0.19.9}[6286]: started

Really unsure how to bring this any further. It's almost like the nodes are
being found and then decide not to do any connection.

When this line occurs on the US machine (ran again, hence the different
server name):
[2012-10-02 11:22:28,947][TRACE][discovery.zen.ping.unicast] [Firebird] [1]
received response from [#cloud-i-cade1081-0][inet[
ec2-46-137-44-113.eu-west-1.compute.amazonaws.com/46.137.44.113:9300]]:
[ping_response{target
[[Gauntlet][iuj9R7MOTrK_Rsq_Vc7V7w][inet[/10.214.27.146:9300]]], master
[null], cluster_name[esdemohs]}, ping_response{target
[[Gauntlet][iuj9R7MOTrK_Rsq_Vc7V7w][inet[/10.214.27.146:9300]]], master
[null], cluster_name[esdemohs]}, ping_response{target
[[Gauntlet][iuj9R7MOTrK_Rsq_Vc7V7w][inet[/10.214.27.146:9300]]], master
[null], cluster_name[esdemohs]}, ping_response{target
[[Firebird][7TNInTRSSo21NnsfBdEzFg][inet[/10.214.27.146:9300]]], master
[null], cluster_name[esdemohs]}, ping_response{target
[[Helio][NEv9k_O4Q0SoUUKWXn0CXg][inet[/10.239.74.227:9300]]], master
[[Helio][NEv9k_O4Q0SoUUKWXn0CXg][inet[/10.239.74.227:9300]]],
cluster_name[esdemohs]}]

The EU machine shows:
[2012-10-02 11:22:29,126][TRACE][transport.netty ] [Helio] channel
opened: [id: 0x502a3135, /50.16.164.109:*41404 *=> /10.239.74.227:9300]

I don't think outbound ports are being limited (only inbound) in AWS. The
US machine does not elect it's own master as it seems to be looking up the
EU-WEST region and not find anything. Any request to the server shows a
status 503 (master not found)

my rootLogger is already on TRACE, i didn't notice any relevant additional
output.

Derry

On 1 October 2012 19:29, Derry O' Sullivan derryos@gmail.com wrote:

HI Drew,

Thanks again for the response. This is my entire config - everything else
is the default (commented out).

I didn't include the filtered out instances lines (we have a lot of other
instances) but the output clearly shows
a) The nodes making requests to EC2 to find the other nodes
b) The nodes finding the other server in the other zone(e.g. EU finding us
and US finding EU e.g:
[2012-10-01 10:27:41,249][TRACE][discovery.ec2 ] [Demiurge]
adding i-b7eb27ca, address ec2-107-22-26-95.compute-1.amazonaws.com,
transport_address inet[
ec2-107-22-26-95.compute-1.amazonaws.com/107.22.26.95:9300]
[2012-10-01 10:27:41,250][DEBUG][discovery.ec2 ] [Demiurge]
using dynamic discovery nodes [[#cloud-i-b7eb27ca-0][inet[
ec2-107-22-26-95.compute-1.amazonaws.com/107.22.26.95:9300]http://ec2-107-22-26-95.compute-1.amazonaws.com/107.22.26.95:9300]
]]

[2012-10-01 10:29:02,400][TRACE][discovery.ec2 ] [Magnum I]
adding i-cade1081, address
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com, transport_address
inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]
[2012-10-01 10:29:02,400][DEBUG][discovery.ec2 ] [Magnum I]
using dynamic discovery nodes [[#cloud-i-cade1081-0][inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]http://ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]
]]

c) both nodes specifically connecting to their tag/cluster matching node
in the other zone, but instantly disconnect:
[2012-10-01 10:29:02,498][TRACE][discovery.zen.ping.unicast] [Magnum I]
[13] received response from [#cloud-i-cade1081-0][inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]http://ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]]:
[ping_response{target [[Magnum
I][Xn8IL89ESJyij3H6DYjKhw][inet[/10.214.221.68:9300]]], master [null],
cluster_name[esdemohs]}, ping_response{target [[Magnum
I][Xn8IL89ESJyij3H6DYjKhw][inet[/10.214.221.68:9300]]], master [null],
cluster_name[esdemohs]}, ping_response{target [[Magnum
I][Xn8IL89ESJyij3H6DYjKhw][inet[/10.214.221.68:9300]]], master [null],
cluster_name[esdemohs]}, ping_response{target
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]], master
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]],
cluster_name[esdemohs]}]
[2012-10-01 10:29:02,498][TRACE][discovery.ec2 ] [Magnum I]
full ping responses:
--> target
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]], master
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]]
[2012-10-01 10:29:02,499][DEBUG][discovery.ec2 ] [Magnum I]
filtered ping responses: (filter_client[true], filter_data[false])
--> target
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]], master
[[Demiurge][EIUweoZ6QWmq1cVCSaay3w][inet[/10.234.254.251:9300]]]
[2012-10-01 10:29:02,501][TRACE][discovery.zen.ping.unicast] [Magnum I]
[13] disconnecting from [#cloud-i-cade1081-0][inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]]
[2012-10-01 10:29:02,501][DEBUG][transport.netty ] [Magnum I]
disconnected from [[#cloud-i-cade1081-0][inet[
ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]http://ec2-176-34-212-181.eu-west-1.compute.amazonaws.com/176.34.212.181:9300]
]]

It is saying filter_client true - would that mean that it thinks the other
instance is a client?

I don't have access to machine now but i'll try and get the full logs up
on pastebin or something tomorrow.

Thanks,

D

On 1 October 2012 18:54, Drew Raines aaraines@gmail.com wrote:

Derry O' Sullivan wrote:

The config of my first server (eu):
cluster.name: esdemohs
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: us-east-1 # not sure if i have to tell
discovery.type: ec2
discovery.ec2.host_type: public_dns
discovery.ec2.ping.timeout: 5m
discovery.ec2.tag.cluster: esdemohs

config of my 2nd server (US):
cluster.name: esdemohs
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: eu-west-1
discovery.type: ec2
discovery.ec2.ping.timeout: 5m
discovery.ec2.host_type: public_dns
discovery.ec2.tag.cluster: esdemohs

I start up both servers, i get a status 200 for the EU server and a
status
503 for the US server (web searching shows that that is because it is
not
able to join the cluster.

[...]

It seems that the config or cloud.aws.region or tags does not affect the
outcome, it just seems like they connect and then disconnect again.

No obvious issues with your config. Although it's interesting that
the us-east node didn't elect itself master also (maybe you have
master disabled on that one).

Feels more like a security group issue. Let's try turning on TRACE
logging for the us-east node. In logging.yml uncomment the line that
looks like

#discovery: TRACE

and restart the node. That will give us some output about what it's
trying to access. Look for lines like:

using host_type...
filtering out instance...

-Drew

--

--


(Derry O' Sullivan) #11

Both machines are running Ubuntu 12.04.1 LTS, ES 0.19.9 and the latest
cloud plugin (1.9.0). I've done no specific configurations on indexes etc.

On Thursday, 27 September 2012 15:52:08 UTC+1, Derry O' Sullivan wrote:

Hi all,

I'm trying to setup an elasticsearch cluster with 2 x ec2 small instances.
One is based in the EU and the other in the US east region.

I setup the 2 instances seperately with the same cluster name and verified
they worked ok (just a request to :9200.

I've opened up ports 22(ssh), 9200(http) and the range (9300 - 9400) on my
security groups.

I can't seem to get the 2 machines to communicate properly, both find each
other and try to communicate but seem to elect no master.. - i've changed
the logging.yml to add more debug info (based on hunting through existing
group posts).

The relevant settings on both machines is:
EU machine:
cluster.name: elasticsearch-demo-HS
cloud.aws.access_key:
cloud.aws.secret_key:
discovery.type: ec2
discovery.zen.ping.timeout: 5m

Note, no cloud.aws.region....

US machine:
cluster.name: elasticsearch-demo-HS
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: eu-west ####suggested in other forums to allow us
machine to find eu machine (
https://groups.google.com/forum/#!searchin/elasticsearch/aws/elasticsearch/gB2ag71gFT8/n7-Cpyg5O4cJ
)
discovery.type: ec2
discovery.zen.ping.timeout: 5m

When i start up the EU machine, it does setup and gets information back
from amazon about the instances via the cloud aws plugin
[2012-09-27 14:25:13,391][TRACE][discovery.zen.ping.unicast] [Legacy] [1]
connecting (light) to [#cloud-i-b7eb27ca-0][inet[/hidden:9300]]

It then times out on connection (as the other node in the US has not
started)
[2012-09-27 14:25:43,451][TRACE][discovery.zen.ping.unicast] [Legacy] [1]
failed to connect to [#cloud-i-b7eb27ca-0][inet[/hidden:9300]]
org.elasticsearch.transport.ConnectTransportException: [][inet[/
10.96.67.186:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:585)

I then start the US node:
and get:
[2012-09-27 14:30:42,262][TRACE][discovery.zen.ping.unicast] [Seth] [1]
connecting (light) to [#cloud-i-cade1081-0][inet[/hidden:9300]]

On the EU machine, i then get a:
[2012-09-27 14:47:18,913][DEBUG][transport.netty ] [Adaptoid]
connected to node [[#cloud-i-b7eb27ca-0][inet[hidden:9300]]]
[2012-09-27 14:47:18,914][TRACE][discovery.zen.ping.unicast] [Adaptoid]
[1] connected to [#cloud-i-b7eb27ca-0][inet[hidden:9300]]
[2012-09-27 14:47:18,914][TRACE][discovery.zen.ping.unicast] [Adaptoid]
[1] sending to [#cloud-i-b7eb27ca-0][inet[hidden:9300]]
[2012-09-27 14:47:19,042][TRACE][discovery.zen.ping.unicast] [Adaptoid]
[1] received response from [#cloud-i-b7eb27ca-0][inet[hidden:9300]]:
[ping_response{target
[[Adaptoid][O1XIpLrPTE-yS1kh4OEfTw][inet[/10.240.50.61:9300]]], master
[null], cluster_name[elasticsearch-demo-HS]}, ping_response{target
[[Eliminator][RodL0FfpStSsl0shyZa8Vw][inet[/10.96.67.186:9300]]], master
[null], cluster_name[elasticsearch-demo-HS]}]

When i query the _cluster_stats, i get:
{

  • error: "ClusterBlockException[blocked by:
    [SERVICE_UNAVAILABLE/1/state not recovered /
    initialized];[SERVICE_UNAVAILABLE/2/no master];]",
  • status: 503

}

An initial problem (not looking up across internal IP addresses) was
solved by changing the discovery ec2 type from private_ip to public_ip
(communication across regions). Now it's just stumped by the fact that i
can't get the 2 to talk to each other..

Any help greatly appreciated :wink:

Derry

--


(Derry O' Sullivan) #12

For anyone who is interested, the problem was to do with my
network.publish_host setting. It was not being set correctly so both nodes
could not contact each other. Once that was set correctly, all worked
perfectly.

Special thanks for Drew for the on/offline help!

Derry

On 2 October 2012 15:14, Derry O' Sullivan derryos@gmail.com wrote:

Both machines are running Ubuntu 12.04.1 LTS, ES 0.19.9 and the latest
cloud plugin (1.9.0). I've done no specific configurations on indexes etc.

On Thursday, 27 September 2012 15:52:08 UTC+1, Derry O' Sullivan wrote:

Hi all,

I'm trying to setup an elasticsearch cluster with 2 x ec2 small
instances. One is based in the EU and the other in the US east region.

I setup the 2 instances seperately with the same cluster name and
verified they worked ok (just a request to :9200.

I've opened up ports 22(ssh), 9200(http) and the range (9300 - 9400) on
my security groups.

I can't seem to get the 2 machines to communicate properly, both find
each other and try to communicate but seem to elect no master.. - i've
changed the logging.yml to add more debug info (based on hunting through
existing group posts).

The relevant settings on both machines is:
EU machine:
cluster.name: elasticsearch-demo-HS
cloud.aws.access_key:
cloud.aws.secret_key:
discovery.type: ec2
discovery.zen.ping.timeout: 5m

Note, no cloud.aws.region....

US machine:
cluster.name: elasticsearch-demo-HS
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.aws.region: eu-west ####suggested in other forums to allow us
machine to find eu machine (https://groups.google.com/**forum/#!searchin/
**elasticsearch/aws/**elasticsearch/gB2ag71gFT8/n7-**Cpyg5O4cJhttps://groups.google.com/forum/#!searchin/elasticsearch/aws/elasticsearch/gB2ag71gFT8/n7-Cpyg5O4cJ
)
discovery.type: ec2
discovery.zen.ping.timeout: 5m

When i start up the EU machine, it does setup and gets information back
from amazon about the instances via the cloud aws plugin
[2012-09-27 14:25:13,391][TRACE][**discovery.zen.ping.unicast] [Legacy]
[1] connecting (light) to [#cloud-i-b7eb27ca-0][inet[/**hidden:9300]]

It then times out on connection (as the other node in the US has not
started)
[2012-09-27 14:25:43,451][TRACE][**discovery.zen.ping.unicast] [Legacy]
[1] failed to connect to [#cloud-i-b7eb27ca-0][inet[/**hidden:9300]]
org.elasticsearch.transport.**ConnectTransportException: [][inet[/
10.96.67.186:9300]] connect_timeout[30s]
at org.elasticsearch.transport.netty.NettyTransport.
connectToChannelsLight(**NettyTransport.java:585)

I then start the US node:
and get:
[2012-09-27 14:30:42,262][TRACE][**discovery.zen.ping.unicast] [Seth]
[1] connecting (light) to [#cloud-i-cade1081-0][inet[/**hidden:9300]]

On the EU machine, i then get a:
[2012-09-27 14:47:18,913][DEBUG][**transport.netty ] [Adaptoid]
connected to node [[#cloud-i-b7eb27ca-0][inet[**hidden:9300]]]
[2012-09-27 14:47:18,914][TRACE][**discovery.zen.ping.unicast]
[Adaptoid] [1] connected to [#cloud-i-b7eb27ca-0][inet[**hidden:9300]]
[2012-09-27 14:47:18,914][TRACE][**discovery.zen.ping.unicast]
[Adaptoid] [1] sending to [#cloud-i-b7eb27ca-0][inet[**hidden:9300]]
[2012-09-27 14:47:19,042][TRACE][**discovery.zen.ping.unicast]
[Adaptoid] [1] received response from [#cloud-i-b7eb27ca-0][inet[**hidden:9300]]:
[ping_response{target [[Adaptoid][O1XIpLrPTE-**yS1kh4OEfTw][inet[/
10.240.50.**61:9300]]], master [null], cluster_name[elasticsearch-demo-HS]},
ping_response{target [[Eliminator][RodL0FfpStSsl0shyZa8Vw][inet[/
10.96.67.186:9300]]], master [null], cluster_name[elasticsearch-

demo-HS]}]

When i query the _cluster_stats, i get:
{

  • error: "ClusterBlockException[blocked by:
    [SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_**UNAVAILABLE/2/no
    master];]",
  • status: 503

}

An initial problem (not looking up across internal IP addresses) was
solved by changing the discovery ec2 type from private_ip to public_ip
(communication across regions). Now it's just stumped by the fact that i
can't get the 2 to talk to each other..

Any help greatly appreciated :wink:

Derry

--

--


(Drew Raines) #13

Derry O' Sullivan wrote:

For anyone who is interested, the problem was to do with my
network.publish_host setting. It was not being set correctly so
both nodes could not contact each other. Once that was set
correctly, all worked perfectly.

Here's a minimal 2-node zen config similar to what worked for him:

cluster.name: foo
node.name: alice
network.publish_host: 23.20.100.100
discovery.zen.ping_timeout: 60s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["54.247.55.55"]

cluster.name: foo
node.name: bob
network.publish_host: 54.247.55.55
discovery.zen.ping_timeout: 60s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["23.20.100.100"]

If alice is in us-east and bob in eu-west, they won't be able to
communicate by 10/8 addresses[1]. Since an individual ec2 node knows
nothing about its public IP, its local interface is identified by a
10/8 address.

When alice looks through the unicast list to try and join bob, she
will connect to bob's internal interface (due to his auto-resolved
network.bind_host[2]) through the AWS perimeter by his public IP.
During the transport handshake, bob needs to tell her how to get
back to him to communicate further, and that's what the
network.publish_host is for. alice also tells bob about her
publish_host, and they're off to the races.[3]

Derry originally was trying to get this working with ec2 discovery
and reported that correctly supplying publish_host and his security
group caused it to work as well. So if you want a simpler
alternative to zen for this particular case, you should try that.

-Drew

Footnotes:

[1] They have to talk over public nets.

[2] In order to open a socket, ES has to bind to an interface. ES
can use the network.bind_host setting if you don't want it to
resolve it for itself. And if you supply simply network.host
without the other two, it will use that IP to bind to an
interface and talk to peers.

[3] Note that this technique should also work across generic NATs if
you supplied appropriate host:port combinations in the unicast
list and then told your router to redirect those ports to
appropriate nodes.

--


(Derry O' Sullivan) #14

To add to this, the EC2 plugin also works ok. It also seems to work fine
without security groups too.

On the timeouts, i suspect 60s is slightly high given the fact that
cross-region AWS comms are actually very fast but for 'uncontrollable'
latency purposes, i've left it at a minute.

Done some wikipedia river testing on the setup and works perfectly with
000's of docs.

On 4 October 2012 16:42, Drew Raines aaraines@gmail.com wrote:

Derry O' Sullivan wrote:

For anyone who is interested, the problem was to do with my
network.publish_host setting. It was not being set correctly so
both nodes could not contact each other. Once that was set
correctly, all worked perfectly.

Here's a minimal 2-node zen config similar to what worked for him:

cluster.name: foo
node.name: alice
network.publish_host: 23.20.100.100
discovery.zen.ping_timeout: 60s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["54.247.55.55"]

cluster.name: foo
node.name: bob
network.publish_host: 54.247.55.55
discovery.zen.ping_timeout: 60s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["23.20.100.100"]

If alice is in us-east and bob in eu-west, they won't be able to
communicate by 10/8 addresses[1]. Since an individual ec2 node knows
nothing about its public IP, its local interface is identified by a
10/8 address.

When alice looks through the unicast list to try and join bob, she
will connect to bob's internal interface (due to his auto-resolved
network.bind_host[2]) through the AWS perimeter by his public IP.
During the transport handshake, bob needs to tell her how to get
back to him to communicate further, and that's what the
network.publish_host is for. alice also tells bob about her
publish_host, and they're off to the races.[3]

Derry originally was trying to get this working with ec2 discovery
and reported that correctly supplying publish_host and his security
group caused it to work as well. So if you want a simpler
alternative to zen for this particular case, you should try that.

-Drew

Footnotes:

[1] They have to talk over public nets.

[2] In order to open a socket, ES has to bind to an interface. ES
can use the network.bind_host setting if you don't want it to
resolve it for itself. And if you supply simply network.host
without the other two, it will use that IP to bind to an
interface and talk to peers.

[3] Note that this technique should also work across generic NATs if
you supplied appropriate host:port combinations in the unicast
list and then told your router to redirect those ports to
appropriate nodes.

--

--


(system) #15