EC2 Configuration


(James Cook) #1

Can someone share a working EC2 configuration?

I haven't been able to get discovery to work, although the S3 gateway seems
to be working.

Thanks,
jim


(Paul Loy) #2

this is what we used to have (that worked):

cloud:
aws:
access_key: XXX
secret_key: XXX

network:
host: <private_ipv4>

discovery:
ec2:
tag.deployment: dev

Then you need a tag of deployment=dev on each ec2 instance you provision
that you want to be a part of this cluster.

Although I now just use Zen as I have to gather a list of seed hosts for
Cassandra, GridGain and HornetQ anyway.

On Fri, May 20, 2011 at 3:09 PM, James Cook jcook@tracermedia.com wrote:

Can someone share a working EC2 configuration?

I haven't been able to get discovery to work, although the S3 gateway seems
to be working.

Thanks,
jim

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(James Cook-2) #3

Thanks Paul,

Did you need to use the tags for discovery, or were you just using them for
the function they provide?

It's interesting how many clustering technologies are built which rely on
multicast or IP lists for discovery. We are using Hazelcast as a memcache
layer and we let Elastic Search perform its discovery process, then use the
cluster state to identify the other nodes in the cluster. Next I use this
list of IPs to bootstrap Hazelcast. It would be nice of these other products
pick up on ES's EC2 discovery process.

Although, it certainly seems like something in my environment or
configuration is causing the current EC2 discovery to fail.

Here is my config:

Starting the Elastic Search server node with these settings:
cloud.aws.access_key : AKIAJD5OZUDKRQ3DVURA
cloud.aws.secret_key :
cluster.name : elasticsearch-prod
discovery.type : ec2
gateway.s3.bucket : ppkc-es-gateway-prod
gateway.type : s3
http.enabled : true
http.port : 9311
index.mapping._id.indexed : true
index.store.fs.memory.direct : true
index.store.type : niofs
name : Supernalia
network.host : eth0:ipv4
node.data : true
path.conf : /usr/share/tomcat6/webapps/ROOT//WEB-INF/config
path.data : /var/local/es/data
path.data_with_cluster : /var/local/es/data/elasticsearch-prod
path.home : /var/local/es
path.logs : /var/local/es/logs
path.work : /var/local/es/work
path.work_with_cluster : /var/local/es/work/elasticsearch-prod
transport.tcp.port : 9310

On Fri, May 20, 2011 at 1:37 PM, Paul Loy keteracel@gmail.com wrote:

this is what we used to have (that worked):

cloud:
aws:
access_key: XXX
secret_key: XXX

network:
host: <private_ipv4>

discovery:
ec2:
tag.deployment: dev

Then you need a tag of deployment=dev on each ec2 instance you provision
that you want to be a part of this cluster.

Although I now just use Zen as I have to gather a list of seed hosts for
Cassandra, GridGain and HornetQ anyway.

On Fri, May 20, 2011 at 3:09 PM, James Cook jcook@tracermedia.com wrote:

Can someone share a working EC2 configuration?

I haven't been able to get discovery to work, although the S3 gateway
seems to be working.

Thanks,
jim

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Shay Banon) #4

You can set discovery.ec2 to TRACE level logging and maybe that can shed some light as to why they can't find each other.
On Saturday, May 21, 2011 at 5:31 AM, James Cook wrote:

Thanks Paul,

Did you need to use the tags for discovery, or were you just using them for the function they provide?

It's interesting how many clustering technologies are built which rely on multicast or IP lists for discovery. We are using Hazelcast as a memcache layer and we let Elastic Search perform its discovery process, then use the cluster state to identify the other nodes in the cluster. Next I use this list of IPs to bootstrap Hazelcast. It would be nice of these other products pick up on ES's EC2 discovery process.

Although, it certainly seems like something in my environment or configuration is causing the current EC2 discovery to fail.

Here is my config:

Starting the Elastic Search server node with these settings:
cloud.aws.access_key : AKIAJD5OZUDKRQ3DVURA
cloud.aws.secret_key :
cluster.name : elasticsearch-prod
discovery.type : ec2
gateway.s3.bucket : ppkc-es-gateway-prod
gateway.type : s3
http.enabled : true
http.port : 9311
index.mapping._id.indexed : true
index.store.fs.memory.direct : true
index.store.type : niofs
name : Supernalia
network.host : eth0:ipv4
node.data : true
path.conf : /usr/share/tomcat6/webapps/ROOT//WEB-INF/config
path.data : /var/local/es/data
path.data_with_cluster : /var/local/es/data/elasticsearch-prod
path.home : /var/local/es
path.logs : /var/local/es/logs
path.work : /var/local/es/work
path.work_with_cluster : /var/local/es/work/elasticsearch-prod
transport.tcp.port : 9310

On Fri, May 20, 2011 at 1:37 PM, Paul Loy keteracel@gmail.com wrote:

this is what we used to have (that worked):

cloud:
aws:
access_key: XXX
secret_key: XXX

network:
host: <private_ipv4>

discovery:
ec2:
tag.deployment: dev

Then you need a tag of deployment=dev on each ec2 instance you provision that you want to be a part of this cluster.

Although I now just use Zen as I have to gather a list of seed hosts for Cassandra, GridGain and HornetQ anyway.

On Fri, May 20, 2011 at 3:09 PM, James Cook jcook@tracermedia.com wrote:

Can someone share a working EC2 configuration?

I haven't been able to get discovery to work, although the S3 gateway seems to be working.

Thanks,
jim

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(James Cook) #5

Here is the full log dump:

You can see that the EC2 discovery finds the other instance, but not much
indication as to why it doesn't join the cluster, at least that I can see.

[Hela] {elasticsearch/0.16.0}[1332]: starting ...
Using the autodetected NIO constraint level: 0
[Hela] Bound to address [/10.86.241.201:9310]
[Hela] bound_address {inet[/10.86.241.201:9310]}, publish_address {inet[/
10.86.241.201:9310]}
Checking for connections, idleTimeout: 1305995206846
HttpConnectionManager.getConnection: config = HostConfiguration[host=
http://ec2.amazonaws.com], timeout = 0
Allocating new connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Open connection to ec2.amazonaws.com:80
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Adding connection at: 1305995236962
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes
[[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]],
[#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] Connected to node
[[Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]]]
Checking for connections, idleTimeout: 1305995209972
HttpConnectionManager.getConnection: config = HostConfiguration[host=
http://ec2.amazonaws.com], timeout = 0
Getting free connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Adding connection at: 1305995240050
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes
[[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]],
[#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] ping responses: {none}
[Hela] processing [zen-disco-join (elected_as_master)]: execute
[Hela] cluster state updated, version [1], source [zen-disco-join
(elected_as_master)]
[Hela] new_master [Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]],
reason: zen-disco-join (elected_as_master)
[Hela] processing [reroute_rivers_node_changed]: execute
[Hela] processing [reroute_rivers_node_changed]: no change in cluster_state

The two instances are spun up using Amazon Elastic Beanstalk, so the ES
servers embedded in each VM have the exact same configuration, so cluster
name is identical.

On Sat, May 21, 2011 at 10:10 AM, Shay Banon
shay.banon@elasticsearch.comwrote:

You can set discovery.ec2 to TRACE level logging and maybe that can shed
some light as to why they can't find each other.

On Saturday, May 21, 2011 at 5:31 AM, James Cook wrote:

Thanks Paul,

Did you need to use the tags for discovery, or were you just using them for
the function they provide?

It's interesting how many clustering technologies are built which rely on
multicast or IP lists for discovery. We are using Hazelcast as a memcache
layer and we let Elastic Search perform its discovery process, then use the
cluster state to identify the other nodes in the cluster. Next I use this
list of IPs to bootstrap Hazelcast. It would be nice of these other products
pick up on ES's EC2 discovery process.

Although, it certainly seems like something in my environment or
configuration is causing the current EC2 discovery to fail.

Here is my config:

Starting the Elastic Search server node with these settings:
cloud.aws.access_key : AKIAJD5OZUDKRQ3DVURA
cloud.aws.secret_key :
cluster.name : elasticsearch-prod
discovery.type : ec2
gateway.s3.bucket : ppkc-es-gateway-prod
gateway.type : s3
http.enabled : true
http.port : 9311
index.mapping._id.indexed : true
index.store.fs.memory.direct : true
index.store.type : niofs
name : Supernalia
network.host : eth0:ipv4
node.data : true
path.conf : /usr/share/tomcat6/webapps/ROOT//WEB-INF/config
path.data : /var/local/es/data
path.data_with_cluster : /var/local/es/data/elasticsearch-prod
path.home : /var/local/es
path.logs : /var/local/es/logs
path.work : /var/local/es/work
path.work_with_cluster : /var/local/es/work/elasticsearch-prod
transport.tcp.port : 9310

On Fri, May 20, 2011 at 1:37 PM, Paul Loy keteracel@gmail.com wrote:

this is what we used to have (that worked):

cloud:
aws:
access_key: XXX
secret_key: XXX

network:
host: <private_ipv4>

discovery:
ec2:
tag.deployment: dev

Then you need a tag of deployment=dev on each ec2 instance you provision
that you want to be a part of this cluster.

Although I now just use Zen as I have to gather a list of seed hosts for
Cassandra, GridGain and HornetQ anyway.

On Fri, May 20, 2011 at 3:09 PM, James Cook jcook@tracermedia.com wrote:

Can someone share a working EC2 configuration?

I haven't been able to get discovery to work, although the S3 gateway seems
to be working.

Thanks,
jim

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Shay Banon) #6

I don't see a log where it connected to the other node. Are you sure there isn't firewall or something between them?

CAn you try and increase the ping timeout? Set discovery.zen.ping_timeout to something like 5m (5 minutes, just to see what happens).
On Saturday, May 21, 2011 at 7:38 PM, James Cook wrote:

Here is the full log dump:
https://gist.github.com/984657

You can see that the EC2 discovery finds the other instance, but not much indication as to why it doesn't join the cluster, at least that I can see.

[Hela] {elasticsearch/0.16.0}[1332]: starting ...
Using the autodetected NIO constraint level: 0
[Hela] Bound to address [/10.86.241.201:9310]
[Hela] bound_address {inet[/10.86.241.201:9310]}, publish_address {inet[/10.86.241.201:9310]}
Checking for connections, idleTimeout: 1305995206846
HttpConnectionManager.getConnection: config = HostConfiguration[host=http://ec2.amazonaws.com], timeout = 0
Allocating new connection, hostConfig=HostConfiguration[host=http://ec2.amazonaws.com]
Open connection to ec2.amazonaws.com:80
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=http://ec2.amazonaws.com]
Adding connection at: 1305995236962
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes [[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]], [#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] Connected to node [[Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]]]
Checking for connections, idleTimeout: 1305995209972
HttpConnectionManager.getConnection: config = HostConfiguration[host=http://ec2.amazonaws.com], timeout = 0
Getting free connection, hostConfig=HostConfiguration[host=http://ec2.amazonaws.com]
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=http://ec2.amazonaws.com]
Adding connection at: 1305995240050
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes [[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]], [#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] ping responses: {none}
[Hela] processing [zen-disco-join (elected_as_master)]: execute
[Hela] cluster state updated, version [1], source [zen-disco-join (elected_as_master)]
[Hela] new_master [Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]], reason: zen-disco-join (elected_as_master)
[Hela] processing [reroute_rivers_node_changed]: execute
[Hela] processing [reroute_rivers_node_changed]: no change in cluster_state

The two instances are spun up using Amazon Elastic Beanstalk, so the ES servers embedded in each VM have the exact same configuration, so cluster name is identical.

On Sat, May 21, 2011 at 10:10 AM, Shay Banon shay.banon@elasticsearch.com wrote:

You can set discovery.ec2 to TRACE level logging and maybe that can shed some light as to why they can't find each other.
On Saturday, May 21, 2011 at 5:31 AM, James Cook wrote:

Thanks Paul,

Did you need to use the tags for discovery, or were you just using them for the function they provide?

It's interesting how many clustering technologies are built which rely on multicast or IP lists for discovery. We are using Hazelcast as a memcache layer and we let Elastic Search perform its discovery process, then use the cluster state to identify the other nodes in the cluster. Next I use this list of IPs to bootstrap Hazelcast. It would be nice of these other products pick up on ES's EC2 discovery process.

Although, it certainly seems like something in my environment or configuration is causing the current EC2 discovery to fail.

Here is my config:

Starting the Elastic Search server node with these settings:
cloud.aws.access_key : AKIAJD5OZUDKRQ3DVURA
cloud.aws.secret_key :
cluster.name : elasticsearch-prod
discovery.type : ec2
gateway.s3.bucket : ppkc-es-gateway-prod
gateway.type : s3
http.enabled : true
http.port : 9311
index.mapping._id.indexed : true
index.store.fs.memory.direct : true
index.store.type : niofs
name : Supernalia
network.host : eth0:ipv4
node.data : true
path.conf : /usr/share/tomcat6/webapps/ROOT//WEB-INF/config
path.data : /var/local/es/data
path.data_with_cluster : /var/local/es/data/elasticsearch-prod
path.home : /var/local/es
path.logs : /var/local/es/logs
path.work : /var/local/es/work
path.work_with_cluster : /var/local/es/work/elasticsearch-prod
transport.tcp.port : 9310

On Fri, May 20, 2011 at 1:37 PM, Paul Loy keteracel@gmail.com wrote:

this is what we used to have (that worked):

cloud:
aws:
access_key: XXX
secret_key: XXX

network:
host: <private_ipv4>

discovery:
ec2:
tag.deployment: dev

Then you need a tag of deployment=dev on each ec2 instance you provision that you want to be a part of this cluster.

Although I now just use Zen as I have to gather a list of seed hosts for Cassandra, GridGain and HornetQ anyway.

On Fri, May 20, 2011 at 3:09 PM, James Cook jcook@tracermedia.com wrote:

Can someone share a working EC2 configuration?

I haven't been able to get discovery to work, although the S3 gateway seems to be working.

Thanks,
jim

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(James Cook) #7

Does the ping timeout still have meaning when performing EC2 discovery? I
set it to 5m and it still did not find the other node.

01:19:44,065 DEBUG thread-2 search.discovery.ec2: 70 -
[Victor von Doom] using dynamic discovery nodes [[#cloud-i-c79142a9-0]
[inet[/10.193.135.143:9310]],
[#cloud-i-c59142ab-0][inet[/10.207.53.60:9310]]]
01:19:59,067 DEBUG thread-1 search.discovery.ec2: 70 -
[Victor von Doom] ping responses: {none}
01:19:59,083 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] processing [zen-disco-join (elected_as_master)]:
execute
01:19:59,083 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] cluster state updated, version [1], source
[zen-disco-join
(elected_as_master)]
01:19:59,085 INFO thread-1 arch.cluster.service: 78 -
[Victor von Doom] new_master [Victor von Doom][X-nwlRCRSKufTPT7wMQdGw]
[inet[/10.193.135.143:9310]], reason: zen-disco-join (elected_as_master)
01:19:59,088 DEBUG thread-1 search.river.cluster: 70 -
[Victor von Doom] processing [reroute_rivers_node_changed]: execute
01:19:59,088 INFO main sticsearch.discovery: 78 -
[Victor von Doom] elasticsearch-prod/X-nwlRCRSKufTPT7wMQdGw
01:19:59,088 DEBUG thread-1 search.river.cluster: 70 -
[Victor von Doom] processing [reroute_rivers_node_changed]: no change in
cluster_state
01:19:59,090 DEBUG thread-1 ticsearch.gateway.s3: 70 -
[Victor von Doom] reading state from gateway
org.elasticsearch.gateway.shared.SharedStorageGateway$1@21a722ef ...
01:19:59,090 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] processing [zen-disco-join (elected_as_master)]:
done applying updated cluster_state

You can see the EC2 discovery did its job and found the IP address of the
other node, and there was a 15 sec delay before ES reported no ping
responses.

I have the security group for my EC2 nodes opening all ports between 9310
and 9360. (ES nodes are using 9310 - 9312, and Hazelcast uses port 9350.

If I ssh into one of my nodes, is there anything I can do (telnet?) to make
sure my other node is reachable? It is getting a bit frustrating at this
point that I can't get these nodes to see each other.

Jim Cook
jcook@tracermedia.com

tracermedia interactive http://www.tracermedia.com/
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Mon, May 23, 2011 at 6:57 PM, Shay Banon shay.banon@elasticsearch.comwrote:

I don't see a log where it connected to the other node. Are you sure
there isn't firewall or something between them?

CAn you try and increase the ping timeout? Set discovery.zen.ping_timeout
to something like 5m (5 minutes, just to see what happens).

On Saturday, May 21, 2011 at 7:38 PM, James Cook wrote:

Here is the full log dump:
https://gist.github.com/984657

You can see that the EC2 discovery finds the other instance, but not much
indication as to why it doesn't join the cluster, at least that I can see.

[Hela] {elasticsearch/0.16.0}[1332]: starting ...
Using the autodetected NIO constraint level: 0
[Hela] Bound to address [/10.86.241.201:9310]
[Hela] bound_address {inet[/10.86.241.201:9310]}, publish_address {inet[/
10.86.241.201:9310]}
Checking for connections, idleTimeout: 1305995206846
HttpConnectionManager.getConnection: config = HostConfiguration[host=
http://ec2.amazonaws.com], timeout = 0
Allocating new connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Open connection to ec2.amazonaws.com:80
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Adding connection at: 1305995236962
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes
[[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]],
[#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] Connected to node
[[Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]]]
Checking for connections, idleTimeout: 1305995209972
HttpConnectionManager.getConnection: config = HostConfiguration[host=
http://ec2.amazonaws.com], timeout = 0
Getting free connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Adding connection at: 1305995240050
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes
[[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]],
[#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] ping responses: {none}
[Hela] processing [zen-disco-join (elected_as_master)]: execute
[Hela] cluster state updated, version [1], source [zen-disco-join
(elected_as_master)]
[Hela] new_master [Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]],
reason: zen-disco-join (elected_as_master)
[Hela] processing [reroute_rivers_node_changed]: execute
[Hela] processing [reroute_rivers_node_changed]: no change in cluster_state

The two instances are spun up using Amazon Elastic Beanstalk, so the ES
servers embedded in each VM have the exact same configuration, so cluster
name is identical.

On Sat, May 21, 2011 at 10:10 AM, Shay Banon <shay.banon@elasticsearch.com

wrote:

You can set discovery.ec2 to TRACE level logging and maybe that can shed
some light as to why they can't find each other.

On Saturday, May 21, 2011 at 5:31 AM, James Cook wrote:

Thanks Paul,

Did you need to use the tags for discovery, or were you just using them for
the function they provide?

It's interesting how many clustering technologies are built which rely on
multicast or IP lists for discovery. We are using Hazelcast as a memcache
layer and we let Elastic Search perform its discovery process, then use the
cluster state to identify the other nodes in the cluster. Next I use this
list of IPs to bootstrap Hazelcast. It would be nice of these other products
pick up on ES's EC2 discovery process.

Although, it certainly seems like something in my environment or
configuration is causing the current EC2 discovery to fail.

Here is my config:

Starting the Elastic Search server node with these settings:
cloud.aws.access_key : AKIAJD5OZUDKRQ3DVURA
cloud.aws.secret_key :
cluster.name : elasticsearch-prod
discovery.type : ec2
gateway.s3.bucket : ppkc-es-gateway-prod
gateway.type : s3
http.enabled : true
http.port : 9311
index.mapping._id.indexed : true
index.store.fs.memory.direct : true
index.store.type : niofs
name : Supernalia
network.host : eth0:ipv4
node.data : true
path.conf : /usr/share/tomcat6/webapps/ROOT//WEB-INF/config
path.data : /var/local/es/data
path.data_with_cluster : /var/local/es/data/elasticsearch-prod
path.home : /var/local/es
path.logs : /var/local/es/logs
path.work : /var/local/es/work
path.work_with_cluster : /var/local/es/work/elasticsearch-prod
transport.tcp.port : 9310

On Fri, May 20, 2011 at 1:37 PM, Paul Loy keteracel@gmail.com wrote:

this is what we used to have (that worked):

cloud:
aws:
access_key: XXX
secret_key: XXX

network:
host: <private_ipv4>

discovery:
ec2:
tag.deployment: dev

Then you need a tag of deployment=dev on each ec2 instance you provision
that you want to be a part of this cluster.

Although I now just use Zen as I have to gather a list of seed hosts for
Cassandra, GridGain and HornetQ anyway.

On Fri, May 20, 2011 at 3:09 PM, James Cook jcook@tracermedia.com wrote:

Can someone share a working EC2 configuration?

I haven't been able to get discovery to work, although the S3 gateway seems
to be working.

Thanks,
jim

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Shay Banon) #8

Is this is the log of one node. The first node will say that it received no ping responses, and then when starting the second node you should see the ping messages (assuming it managed to connect properly). Here is a gist of a simple two node cluster I started with unicast discovery: https://gist.github.com/990298. The gist has logging for discovery set to TRACE.

Can you gist your settings again? I set discovery.zen.ping_timeout and I see it being honored (tested on 0.16).

-shay.banon
On Wednesday, May 25, 2011 at 4:42 AM, James Cook wrote:

Does the ping timeout still have meaning when performing EC2 discovery? I set it to 5m and it still did not find the other node.

01:19:44,065 DEBUG thread-2 search.discovery.ec2: 70 -
[Victor von Doom] using dynamic discovery nodes [[#cloud-i-c79142a9-0]
[inet[/10.193.135.143:9310]], [#cloud-i-c59142ab-0][inet[/10.207.53.60:9310]]]
01:19:59,067 DEBUG thread-1 search.discovery.ec2: 70 -
[Victor von Doom] ping responses: {none}
01:19:59,083 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] processing [zen-disco-join (elected_as_master)]: execute
01:19:59,083 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] cluster state updated, version [1], source [zen-disco-join
(elected_as_master)]
01:19:59,085 INFO thread-1 arch.cluster.service: 78 -
[Victor von Doom] new_master [Victor von Doom][X-nwlRCRSKufTPT7wMQdGw]
[inet[/10.193.135.143:9310]], reason: zen-disco-join (elected_as_master)
01:19:59,088 DEBUG thread-1 search.river.cluster: 70 -
[Victor von Doom] processing [reroute_rivers_node_changed]: execute
01:19:59,088 INFO main sticsearch.discovery: 78 -
[Victor von Doom] elasticsearch-prod/X-nwlRCRSKufTPT7wMQdGw
01:19:59,088 DEBUG thread-1 search.river.cluster: 70 -
[Victor von Doom] processing [reroute_rivers_node_changed]: no change in cluster_state
01:19:59,090 DEBUG thread-1 ticsearch.gateway.s3: 70 -
[Victor von Doom] reading state from gateway
org.elasticsearch.gateway.shared.SharedStorageGateway$1@21a722ef ...
01:19:59,090 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] processing [zen-disco-join (elected_as_master)]:
done applying updated cluster_state

You can see the EC2 discovery did its job and found the IP address of the other node, and there was a 15 sec delay before ES reported no ping responses.

I have the security group for my EC2 nodes opening all ports between 9310 and 9360. (ES nodes are using 9310 - 9312, and Hazelcast uses port 9350.

If I ssh into one of my nodes, is there anything I can do (telnet?) to make sure my other node is reachable? It is getting a bit frustrating at this point that I can't get these nodes to see each other.

Jim Cook
jcook@tracermedia.com

tracermedia interactive
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Mon, May 23, 2011 at 6:57 PM, Shay Banon shay.banon@elasticsearch.com wrote:

I don't see a log where it connected to the other node. Are you sure there isn't firewall or something between them?

CAn you try and increase the ping timeout? Set discovery.zen.ping_timeout to something like 5m (5 minutes, just to see what happens).
On Saturday, May 21, 2011 at 7:38 PM, James Cook wrote:

Here is the full log dump:
https://gist.github.com/984657

You can see that the EC2 discovery finds the other instance, but not much indication as to why it doesn't join the cluster, at least that I can see.

[Hela] {elasticsearch/0.16.0}[1332]: starting ...
Using the autodetected NIO constraint level: 0
[Hela] Bound to address [/10.86.241.201:9310]
[Hela] bound_address {inet[/10.86.241.201:9310]}, publish_address {inet[/10.86.241.201:9310]}
Checking for connections, idleTimeout: 1305995206846
HttpConnectionManager.getConnection: config = HostConfiguration[host=http://ec2.amazonaws.com], timeout = 0
Allocating new connection, hostConfig=HostConfiguration[host=http://ec2.amazonaws.com]
Open connection to ec2.amazonaws.com:80
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=http://ec2.amazonaws.com]
Adding connection at: 1305995236962
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes [[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]], [#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] Connected to node [[Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]]]
Checking for connections, idleTimeout: 1305995209972
HttpConnectionManager.getConnection: config = HostConfiguration[host=http://ec2.amazonaws.com], timeout = 0
Getting free connection, hostConfig=HostConfiguration[host=http://ec2.amazonaws.com]
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=http://ec2.amazonaws.com]
Adding connection at: 1305995240050
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes [[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]], [#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] ping responses: {none}
[Hela] processing [zen-disco-join (elected_as_master)]: execute
[Hela] cluster state updated, version [1], source [zen-disco-join (elected_as_master)]
[Hela] new_master [Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]], reason: zen-disco-join (elected_as_master)
[Hela] processing [reroute_rivers_node_changed]: execute
[Hela] processing [reroute_rivers_node_changed]: no change in cluster_state

The two instances are spun up using Amazon Elastic Beanstalk, so the ES servers embedded in each VM have the exact same configuration, so cluster name is identical.

On Sat, May 21, 2011 at 10:10 AM, Shay Banon shay.banon@elasticsearch.com wrote:

You can set discovery.ec2 to TRACE level logging and maybe that can shed some light as to why they can't find each other.
On Saturday, May 21, 2011 at 5:31 AM, James Cook wrote:

Thanks Paul,

Did you need to use the tags for discovery, or were you just using them for the function they provide?

It's interesting how many clustering technologies are built which rely on multicast or IP lists for discovery. We are using Hazelcast as a memcache layer and we let Elastic Search perform its discovery process, then use the cluster state to identify the other nodes in the cluster. Next I use this list of IPs to bootstrap Hazelcast. It would be nice of these other products pick up on ES's EC2 discovery process.

Although, it certainly seems like something in my environment or configuration is causing the current EC2 discovery to fail.

Here is my config:

Starting the Elastic Search server node with these settings:
cloud.aws.access_key : AKIAJD5OZUDKRQ3DVURA
cloud.aws.secret_key :
cluster.name : elasticsearch-prod
discovery.type : ec2
gateway.s3.bucket : ppkc-es-gateway-prod
gateway.type : s3
http.enabled : true
http.port : 9311
index.mapping._id.indexed : true
index.store.fs.memory.direct : true
index.store.type : niofs
name : Supernalia
network.host : eth0:ipv4
node.data : true
path.conf : /usr/share/tomcat6/webapps/ROOT//WEB-INF/config
path.data : /var/local/es/data
path.data_with_cluster : /var/local/es/data/elasticsearch-prod
path.home : /var/local/es
path.logs : /var/local/es/logs
path.work : /var/local/es/work
path.work_with_cluster : /var/local/es/work/elasticsearch-prod
transport.tcp.port : 9310

On Fri, May 20, 2011 at 1:37 PM, Paul Loy keteracel@gmail.com wrote:

this is what we used to have (that worked):

cloud:
aws:
access_key: XXX
secret_key: XXX

network:
host: <private_ipv4>

discovery:
ec2:
tag.deployment: dev

Then you need a tag of deployment=dev on each ec2 instance you provision that you want to be a part of this cluster.

Although I now just use Zen as I have to gather a list of seed hosts for Cassandra, GridGain and HornetQ anyway.

On Fri, May 20, 2011 at 3:09 PM, James Cook jcook@tracermedia.com wrote:

Can someone share a working EC2 configuration?

I haven't been able to get discovery to work, although the S3 gateway seems to be working.

Thanks,
jim

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(James Cook) #9

Thanks for the gist, but I'm using EC2 discovery and you are using zen
discovery. Does the EC2 discovery use Zen once the IP addresses of other
nodes in the EC2 cluster are identified?

I changed my logging to TRACE (which I didn't think I could do using log4j
without jumping thru hoops, but slf4j/log4j seems to handle it fine), and
finally saw an exception. I would think that any exception should be
logged with ERROR level or at least WARN level for those exceptions which
don't represent a failure condition.

Here is a gist of the two nodes:


Now, the ConnectException is visible. Is there anything I can do to verify
connectivity between the nodes. I'm not sure if Amazon disables icmp on
their VMs, but I cannot ping one node from another.

On Tue, May 24, 2011 at 11:57 PM, Shay Banon
shay.banon@elasticsearch.comwrote:

Is this is the log of one node. The first node will say that it received
no ping responses, and then when starting the second node you should see the
ping messages (assuming it managed to connect properly). Here is a gist of a
simple two node cluster I started with unicast discovery:
https://gist.github.com/990298. The gist has logging for discovery set to
TRACE.

Can you gist your settings again? I set discovery.zen.ping_timeout and I
see it being honored (tested on 0.16).

-shay.banon

On Wednesday, May 25, 2011 at 4:42 AM, James Cook wrote:

Does the ping timeout still have meaning when performing EC2 discovery? I
set it to 5m and it still did not find the other node.

01:19:44,065 DEBUG thread-2 search.discovery.ec2: 70 -
[Victor von Doom] using dynamic discovery nodes [[#cloud-i-c79142a9-0]
[inet[/10.193.135.143:9310]],
[#cloud-i-c59142ab-0][inet[/10.207.53.60:9310]]]
01:19:59,067 DEBUG thread-1 search.discovery.ec2: 70 -
[Victor von Doom] ping responses: {none}
01:19:59,083 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] processing [zen-disco-join (elected_as_master)]:
execute
01:19:59,083 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] cluster state updated, version [1], source
[zen-disco-join
(elected_as_master)]
01:19:59,085 INFO thread-1 arch.cluster.service: 78 -
[Victor von Doom] new_master [Victor von Doom][X-nwlRCRSKufTPT7wMQdGw]
[inet[/10.193.135.143:9310]], reason: zen-disco-join
(elected_as_master)
01:19:59,088 DEBUG thread-1 search.river.cluster: 70 -
[Victor von Doom] processing [reroute_rivers_node_changed]: execute
01:19:59,088 INFO main sticsearch.discovery: 78 -
[Victor von Doom] elasticsearch-prod/X-nwlRCRSKufTPT7wMQdGw
01:19:59,088 DEBUG thread-1 search.river.cluster: 70 -
[Victor von Doom] processing [reroute_rivers_node_changed]: no change
in cluster_state
01:19:59,090 DEBUG thread-1 ticsearch.gateway.s3: 70 -
[Victor von Doom] reading state from gateway
org.elasticsearch.gateway.shared.SharedStorageGateway$1@21a722ef ...
01:19:59,090 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] processing [zen-disco-join (elected_as_master)]:
done applying updated cluster_state

You can see the EC2 discovery did its job and found the IP address of the
other node, and there was a 15 sec delay before ES reported no ping
responses.

I have the security group for my EC2 nodes opening all ports between 9310
and 9360. (ES nodes are using 9310 - 9312, and Hazelcast uses port 9350.

If I ssh into one of my nodes, is there anything I can do (telnet?) to make
sure my other node is reachable? It is getting a bit frustrating at this
point that I can't get these nodes to see each other.

Jim Cook
jcook@tracermedia.com

tracermedia interactive http://www.tracermedia.com/
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Mon, May 23, 2011 at 6:57 PM, Shay Banon shay.banon@elasticsearch.comwrote:

I don't see a log where it connected to the other node. Are you sure
there isn't firewall or something between them?

CAn you try and increase the ping timeout? Set discovery.zen.ping_timeout
to something like 5m (5 minutes, just to see what happens).

On Saturday, May 21, 2011 at 7:38 PM, James Cook wrote:

Here is the full log dump:
https://gist.github.com/984657

You can see that the EC2 discovery finds the other instance, but not much
indication as to why it doesn't join the cluster, at least that I can see.

[Hela] {elasticsearch/0.16.0}[1332]: starting ...
Using the autodetected NIO constraint level: 0
[Hela] Bound to address [/10.86.241.201:9310]
[Hela] bound_address {inet[/10.86.241.201:9310]}, publish_address {inet[/
10.86.241.201:9310]}
Checking for connections, idleTimeout: 1305995206846
HttpConnectionManager.getConnection: config = HostConfiguration[host=
http://ec2.amazonaws.com], timeout = 0
Allocating new connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Open connection to ec2.amazonaws.com:80
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Adding connection at: 1305995236962
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes
[[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]],
[#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] Connected to node
[[Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]]]
Checking for connections, idleTimeout: 1305995209972
HttpConnectionManager.getConnection: config = HostConfiguration[host=
http://ec2.amazonaws.com], timeout = 0
Getting free connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Adding connection at: 1305995240050
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes
[[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]],
[#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] ping responses: {none}
[Hela] processing [zen-disco-join (elected_as_master)]: execute
[Hela] cluster state updated, version [1], source [zen-disco-join
(elected_as_master)]
[Hela] new_master [Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]],
reason: zen-disco-join (elected_as_master)
[Hela] processing [reroute_rivers_node_changed]: execute
[Hela] processing [reroute_rivers_node_changed]: no change in cluster_state

The two instances are spun up using Amazon Elastic Beanstalk, so the ES
servers embedded in each VM have the exact same configuration, so cluster
name is identical.

On Sat, May 21, 2011 at 10:10 AM, Shay Banon <shay.banon@elasticsearch.com

wrote:

You can set discovery.ec2 to TRACE level logging and maybe that can shed
some light as to why they can't find each other.

On Saturday, May 21, 2011 at 5:31 AM, James Cook wrote:

Thanks Paul,

Did you need to use the tags for discovery, or were you just using them for
the function they provide?

It's interesting how many clustering technologies are built which rely on
multicast or IP lists for discovery. We are using Hazelcast as a memcache
layer and we let Elastic Search perform its discovery process, then use the
cluster state to identify the other nodes in the cluster. Next I use this
list of IPs to bootstrap Hazelcast. It would be nice of these other products
pick up on ES's EC2 discovery process.

Although, it certainly seems like something in my environment or
configuration is causing the current EC2 discovery to fail.

Here is my config:

Starting the Elastic Search server node with these settings:
cloud.aws.access_key : AKIAJD5OZUDKRQ3DVURA
cloud.aws.secret_key :
cluster.name : elasticsearch-prod
discovery.type : ec2
gateway.s3.bucket : ppkc-es-gateway-prod
gateway.type : s3
http.enabled : true
http.port : 9311
index.mapping._id.indexed : true
index.store.fs.memory.direct : true
index.store.type : niofs
name : Supernalia
network.host : eth0:ipv4
node.data : true
path.conf : /usr/share/tomcat6/webapps/ROOT//WEB-INF/config
path.data : /var/local/es/data
path.data_with_cluster : /var/local/es/data/elasticsearch-prod
path.home : /var/local/es
path.logs : /var/local/es/logs
path.work : /var/local/es/work
path.work_with_cluster : /var/local/es/work/elasticsearch-prod
transport.tcp.port : 9310

On Fri, May 20, 2011 at 1:37 PM, Paul Loy keteracel@gmail.com wrote:

this is what we used to have (that worked):

cloud:
aws:
access_key: XXX
secret_key: XXX

network:
host: <private_ipv4>

discovery:
ec2:
tag.deployment: dev

Then you need a tag of deployment=dev on each ec2 instance you provision
that you want to be a part of this cluster.

Although I now just use Zen as I have to gather a list of seed hosts for
Cassandra, GridGain and HornetQ anyway.

On Fri, May 20, 2011 at 3:09 PM, James Cook jcook@tracermedia.com wrote:

Can someone share a working EC2 configuration?

I haven't been able to get discovery to work, although the S3 gateway seems
to be working.

Thanks,
jim

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(James Cook) #10

I think I finally got it. It looks like my change to the security groups in
EC2 didn't "take" when I made the change. I prob. forgot to hit "Apply
Changes".

The other node is now being discovered, and a _cluster/state call shows the
two nodes in a cluster together.

Thanks for the help

On Wed, May 25, 2011 at 12:06 PM, James Cook jcook@tracermedia.com wrote:

Thanks for the gist, but I'm using EC2 discovery and you are using zen
discovery. Does the EC2 discovery use Zen once the IP addresses of other
nodes in the EC2 cluster are identified?

I changed my logging to TRACE (which I didn't think I could do using log4j
without jumping thru hoops, but slf4j/log4j seems to handle it fine), and
finally saw an exception. I would think that any exception should be
logged with ERROR level or at least WARN level for those exceptions which
don't represent a failure condition.

Here is a gist of the two nodes:

https://gist.github.com/991171
https://gist.github.com/991179

Now, the ConnectException is visible. Is there anything I can do to verify
connectivity between the nodes. I'm not sure if Amazon disables icmp on
their VMs, but I cannot ping one node from another.

On Tue, May 24, 2011 at 11:57 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Is this is the log of one node. The first node will say that it received
no ping responses, and then when starting the second node you should see the
ping messages (assuming it managed to connect properly). Here is a gist of a
simple two node cluster I started with unicast discovery:
https://gist.github.com/990298. The gist has logging for discovery set to
TRACE.

Can you gist your settings again? I set discovery.zen.ping_timeout and I
see it being honored (tested on 0.16).

-shay.banon

On Wednesday, May 25, 2011 at 4:42 AM, James Cook wrote:

Does the ping timeout still have meaning when performing EC2 discovery? I
set it to 5m and it still did not find the other node.

01:19:44,065 DEBUG thread-2 search.discovery.ec2: 70 -
[Victor von Doom] using dynamic discovery nodes [[#cloud-i-c79142a9-0]
[inet[/10.193.135.143:9310]],
[#cloud-i-c59142ab-0][inet[/10.207.53.60:9310]]]
01:19:59,067 DEBUG thread-1 search.discovery.ec2: 70 -
[Victor von Doom] ping responses: {none}
01:19:59,083 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] processing [zen-disco-join (elected_as_master)]:
execute
01:19:59,083 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] cluster state updated, version [1], source
[zen-disco-join
(elected_as_master)]
01:19:59,085 INFO thread-1 arch.cluster.service: 78 -
[Victor von Doom] new_master [Victor von Doom][X-nwlRCRSKufTPT7wMQdGw]
[inet[/10.193.135.143:9310]], reason: zen-disco-join
(elected_as_master)
01:19:59,088 DEBUG thread-1 search.river.cluster: 70 -
[Victor von Doom] processing [reroute_rivers_node_changed]: execute
01:19:59,088 INFO main sticsearch.discovery: 78 -
[Victor von Doom] elasticsearch-prod/X-nwlRCRSKufTPT7wMQdGw
01:19:59,088 DEBUG thread-1 search.river.cluster: 70 -
[Victor von Doom] processing [reroute_rivers_node_changed]: no change
in cluster_state
01:19:59,090 DEBUG thread-1 ticsearch.gateway.s3: 70 -
[Victor von Doom] reading state from gateway
org.elasticsearch.gateway.shared.SharedStorageGateway$1@21a722ef ...
01:19:59,090 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] processing [zen-disco-join (elected_as_master)]:
done applying updated cluster_state

You can see the EC2 discovery did its job and found the IP address of the
other node, and there was a 15 sec delay before ES reported no ping
responses.

I have the security group for my EC2 nodes opening all ports between 9310
and 9360. (ES nodes are using 9310 - 9312, and Hazelcast uses port 9350.

If I ssh into one of my nodes, is there anything I can do (telnet?) to
make sure my other node is reachable? It is getting a bit frustrating at
this point that I can't get these nodes to see each other.

Jim Cook
jcook@tracermedia.com

tracermedia interactive http://www.tracermedia.com/
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Mon, May 23, 2011 at 6:57 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

I don't see a log where it connected to the other node. Are you sure
there isn't firewall or something between them?

CAn you try and increase the ping timeout? Set discovery.zen.ping_timeout
to something like 5m (5 minutes, just to see what happens).

On Saturday, May 21, 2011 at 7:38 PM, James Cook wrote:

Here is the full log dump:
https://gist.github.com/984657

You can see that the EC2 discovery finds the other instance, but not much
indication as to why it doesn't join the cluster, at least that I can see.

[Hela] {elasticsearch/0.16.0}[1332]: starting ...
Using the autodetected NIO constraint level: 0
[Hela] Bound to address [/10.86.241.201:9310]
[Hela] bound_address {inet[/10.86.241.201:9310]}, publish_address {inet[/
10.86.241.201:9310]}
Checking for connections, idleTimeout: 1305995206846
HttpConnectionManager.getConnection: config = HostConfiguration[host=
http://ec2.amazonaws.com], timeout = 0
Allocating new connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Open connection to ec2.amazonaws.com:80
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Adding connection at: 1305995236962
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes
[[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]],
[#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] Connected to node
[[Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]]]
Checking for connections, idleTimeout: 1305995209972
HttpConnectionManager.getConnection: config = HostConfiguration[host=
http://ec2.amazonaws.com], timeout = 0
Getting free connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]
Adding connection at: 1305995240050
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes
[[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]],
[#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] ping responses: {none}
[Hela] processing [zen-disco-join (elected_as_master)]: execute
[Hela] cluster state updated, version [1], source [zen-disco-join
(elected_as_master)]
[Hela] new_master [Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]],
reason: zen-disco-join (elected_as_master)
[Hela] processing [reroute_rivers_node_changed]: execute
[Hela] processing [reroute_rivers_node_changed]: no change in
cluster_state

The two instances are spun up using Amazon Elastic Beanstalk, so the ES
servers embedded in each VM have the exact same configuration, so cluster
name is identical.

On Sat, May 21, 2011 at 10:10 AM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

You can set discovery.ec2 to TRACE level logging and maybe that can shed
some light as to why they can't find each other.

On Saturday, May 21, 2011 at 5:31 AM, James Cook wrote:

Thanks Paul,

Did you need to use the tags for discovery, or were you just using them
for the function they provide?

It's interesting how many clustering technologies are built which rely on
multicast or IP lists for discovery. We are using Hazelcast as a memcache
layer and we let Elastic Search perform its discovery process, then use the
cluster state to identify the other nodes in the cluster. Next I use this
list of IPs to bootstrap Hazelcast. It would be nice of these other products
pick up on ES's EC2 discovery process.

Although, it certainly seems like something in my environment or
configuration is causing the current EC2 discovery to fail.

Here is my config:

Starting the Elastic Search server node with these settings:
cloud.aws.access_key : AKIAJD5OZUDKRQ3DVURA
cloud.aws.secret_key :
cluster.name : elasticsearch-prod
discovery.type : ec2
gateway.s3.bucket : ppkc-es-gateway-prod
gateway.type : s3
http.enabled : true
http.port : 9311
index.mapping._id.indexed : true
index.store.fs.memory.direct : true
index.store.type : niofs
name : Supernalia
network.host : eth0:ipv4
node.data : true
path.conf : /usr/share/tomcat6/webapps/ROOT//WEB-INF/config
path.data : /var/local/es/data
path.data_with_cluster : /var/local/es/data/elasticsearch-prod
path.home : /var/local/es
path.logs : /var/local/es/logs
path.work : /var/local/es/work
path.work_with_cluster : /var/local/es/work/elasticsearch-prod
transport.tcp.port : 9310

On Fri, May 20, 2011 at 1:37 PM, Paul Loy keteracel@gmail.com wrote:

this is what we used to have (that worked):

cloud:
aws:
access_key: XXX
secret_key: XXX

network:
host: <private_ipv4>

discovery:
ec2:
tag.deployment: dev

Then you need a tag of deployment=dev on each ec2 instance you provision
that you want to be a part of this cluster.

Although I now just use Zen as I have to gather a list of seed hosts for
Cassandra, GridGain and HornetQ anyway.

On Fri, May 20, 2011 at 3:09 PM, James Cook jcook@tracermedia.comwrote:

Can someone share a working EC2 configuration?

I haven't been able to get discovery to work, although the S3 gateway
seems to be working.

Thanks,
jim

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Shay Banon) #11

The ec2 discovery basically uses the same zen unicast discovery, but simply builder the list of IPs using the ec2 APIs.

The reason why the connect exception is not logged as error is because this, in most cases, can and will happen. Depending on the order that nodes come up, a node will not be able to connect to others since they are simply not still there (even with ec2 disco of machines).
On Wednesday, May 25, 2011 at 7:06 PM, James Cook wrote:
Thanks for the gist, but I'm using EC2 discovery and you are using zen discovery. Does the EC2 discovery use Zen once the IP addresses of other nodes in the EC2 cluster are identified?

I changed my logging to TRACE (which I didn't think I could do using log4j without jumping thru hoops, but slf4j/log4j seems to handle it fine), and finally saw an exception. I would think that any exception should be logged with ERROR level or at least WARN level for those exceptions which don't represent a failure condition.
Here is a gist of the two nodes:

https://gist.github.com/991171
https://gist.github.com/991179

Now, the ConnectException is visible. Is there anything I can do to verify connectivity between the nodes. I'm not sure if Amazon disables icmp on their VMs, but I cannot ping one node from another.

On Tue, May 24, 2011 at 11:57 PM, Shay Banon shay.banon@elasticsearch.com wrote:

Is this is the log of one node. The first node will say that it received no ping responses, and then when starting the second node you should see the ping messages (assuming it managed to connect properly). Here is a gist of a simple two node cluster I started with unicast discovery: https://gist.github.com/990298. The gist has logging for discovery set to TRACE.

Can you gist your settings again? I set discovery.zen.ping_timeout and I see it being honored (tested on 0.16).

-shay.banon
On Wednesday, May 25, 2011 at 4:42 AM, James Cook wrote:

Does the ping timeout still have meaning when performing EC2 discovery? I set it to 5m and it still did not find the other node.

01:19:44,065 DEBUG thread-2 search.discovery.ec2: 70 -
[Victor von Doom] using dynamic discovery nodes [[#cloud-i-c79142a9-0]
[inet[/10.193.135.143:9310]], [#cloud-i-c59142ab-0][inet[/10.207.53.60:9310]]]
01:19:59,067 DEBUG thread-1 search.discovery.ec2: 70 -
[Victor von Doom] ping responses: {none}
01:19:59,083 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] processing [zen-disco-join (elected_as_master)]: execute
01:19:59,083 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] cluster state updated, version [1], source [zen-disco-join
(elected_as_master)]
01:19:59,085 INFO thread-1 arch.cluster.service: 78 -
[Victor von Doom] new_master [Victor von Doom][X-nwlRCRSKufTPT7wMQdGw]
[inet[/10.193.135.143:9310]], reason: zen-disco-join (elected_as_master)
01:19:59,088 DEBUG thread-1 search.river.cluster: 70 -
[Victor von Doom] processing [reroute_rivers_node_changed]: execute
01:19:59,088 INFO main sticsearch.discovery: 78 -
[Victor von Doom] elasticsearch-prod/X-nwlRCRSKufTPT7wMQdGw
01:19:59,088 DEBUG thread-1 search.river.cluster: 70 -
[Victor von Doom] processing [reroute_rivers_node_changed]: no change in cluster_state
01:19:59,090 DEBUG thread-1 ticsearch.gateway.s3: 70 -
[Victor von Doom] reading state from gateway
org.elasticsearch.gateway.shared.SharedStorageGateway$1@21a722ef ...
01:19:59,090 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] processing [zen-disco-join (elected_as_master)]:
done applying updated cluster_state

You can see the EC2 discovery did its job and found the IP address of the other node, and there was a 15 sec delay before ES reported no ping responses.

I have the security group for my EC2 nodes opening all ports between 9310 and 9360. (ES nodes are using 9310 - 9312, and Hazelcast uses port 9350.

If I ssh into one of my nodes, is there anything I can do (telnet?) to make sure my other node is reachable? It is getting a bit frustrating at this point that I can't get these nodes to see each other.

Jim Cook
jcook@tracermedia.com

tracermedia interactive
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Mon, May 23, 2011 at 6:57 PM, Shay Banon shay.banon@elasticsearch.com wrote:

I don't see a log where it connected to the other node. Are you sure there isn't firewall or something between them?

CAn you try and increase the ping timeout? Set discovery.zen.ping_timeout to something like 5m (5 minutes, just to see what happens).
On Saturday, May 21, 2011 at 7:38 PM, James Cook wrote:

Here is the full log dump:
https://gist.github.com/984657

You can see that the EC2 discovery finds the other instance, but not much indication as to why it doesn't join the cluster, at least that I can see.

[Hela] {elasticsearch/0.16.0}[1332]: starting ...
Using the autodetected NIO constraint level: 0
[Hela] Bound to address [/10.86.241.201:9310]
[Hela] bound_address {inet[/10.86.241.201:9310]}, publish_address {inet[/10.86.241.201:9310]}
Checking for connections, idleTimeout: 1305995206846
HttpConnectionManager.getConnection: config = HostConfiguration[host=http://ec2.amazonaws.com], timeout = 0
Allocating new connection, hostConfig=HostConfiguration[host=http://ec2.amazonaws.com]
Open connection to ec2.amazonaws.com:80
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=http://ec2.amazonaws.com]
Adding connection at: 1305995236962
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes [[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]], [#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] Connected to node [[Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]]]
Checking for connections, idleTimeout: 1305995209972
HttpConnectionManager.getConnection: config = HostConfiguration[host=http://ec2.amazonaws.com], timeout = 0
Getting free connection, hostConfig=HostConfiguration[host=http://ec2.amazonaws.com]
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=http://ec2.amazonaws.com]
Adding connection at: 1305995240050
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes [[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]], [#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]
[Hela] ping responses: {none}
[Hela] processing [zen-disco-join (elected_as_master)]: execute
[Hela] cluster state updated, version [1], source [zen-disco-join (elected_as_master)]
[Hela] new_master [Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]], reason: zen-disco-join (elected_as_master)
[Hela] processing [reroute_rivers_node_changed]: execute
[Hela] processing [reroute_rivers_node_changed]: no change in cluster_state

The two instances are spun up using Amazon Elastic Beanstalk, so the ES servers embedded in each VM have the exact same configuration, so cluster name is identical.

On Sat, May 21, 2011 at 10:10 AM, Shay Banon shay.banon@elasticsearch.com wrote:

You can set discovery.ec2 to TRACE level logging and maybe that can shed some light as to why they can't find each other.
On Saturday, May 21, 2011 at 5:31 AM, James Cook wrote:

Thanks Paul,

Did you need to use the tags for discovery, or were you just using them for the function they provide?

It's interesting how many clustering technologies are built which rely on multicast or IP lists for discovery. We are using Hazelcast as a memcache layer and we let Elastic Search perform its discovery process, then use the cluster state to identify the other nodes in the cluster. Next I use this list of IPs to bootstrap Hazelcast. It would be nice of these other products pick up on ES's EC2 discovery process.

Although, it certainly seems like something in my environment or configuration is causing the current EC2 discovery to fail.

Here is my config:

Starting the Elastic Search server node with these settings:
cloud.aws.access_key : AKIAJD5OZUDKRQ3DVURA
cloud.aws.secret_key :
cluster.name : elasticsearch-prod
discovery.type : ec2
gateway.s3.bucket : ppkc-es-gateway-prod
gateway.type : s3
http.enabled : true
http.port : 9311
index.mapping._id.indexed : true
index.store.fs.memory.direct : true
index.store.type : niofs
name : Supernalia
network.host : eth0:ipv4
node.data : true
path.conf : /usr/share/tomcat6/webapps/ROOT//WEB-INF/config
path.data : /var/local/es/data
path.data_with_cluster : /var/local/es/data/elasticsearch-prod
path.home : /var/local/es
path.logs : /var/local/es/logs
path.work : /var/local/es/work
path.work_with_cluster : /var/local/es/work/elasticsearch-prod
transport.tcp.port : 9310

On Fri, May 20, 2011 at 1:37 PM, Paul Loy keteracel@gmail.com wrote:

this is what we used to have (that worked):

cloud:
aws:
access_key: XXX
secret_key: XXX

network:
host: <private_ipv4>

discovery:
ec2:
tag.deployment: dev

Then you need a tag of deployment=dev on each ec2 instance you provision that you want to be a part of this cluster.

Although I now just use Zen as I have to gather a list of seed hosts for Cassandra, GridGain and HornetQ anyway.

On Fri, May 20, 2011 at 3:09 PM, James Cook jcook@tracermedia.com wrote:

Can someone share a working EC2 configuration?

I haven't been able to get discovery to work, although the S3 gateway seems to be working.

Thanks,
jim

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Clinton Gormley) #12

Given that setting up ES on EC2 is a FAQ and tricky to get right, it'd be
great if someone would write a tutorial to add to the website.

Ta

Clint
On May 25, 2011 7:34 PM, "Shay Banon" shay.banon@elasticsearch.com wrote:

The ec2 discovery basically uses the same zen unicast discovery, but
simply builder the list of IPs using the ec2 APIs.

The reason why the connect exception is not logged as error is because
this, in most cases, can and will happen. Depending on the order that nodes
come up, a node will not be able to connect to others since they are simply
not still there (even with ec2 disco of machines).
On Wednesday, May 25, 2011 at 7:06 PM, James Cook wrote:
Thanks for the gist, but I'm using EC2 discovery and you are using zen
discovery. Does the EC2 discovery use Zen once the IP addresses of other
nodes in the EC2 cluster are identified?

I changed my logging to TRACE (which I didn't think I could do using
log4j without jumping thru hoops, but slf4j/log4j seems to handle it fine),
and finally saw an exception. I would think that any exception should be
logged with ERROR level or at least WARN level for those exceptions which
don't represent a failure condition.

Here is a gist of the two nodes:

https://gist.github.com/991171
https://gist.github.com/991179

Now, the ConnectException is visible. Is there anything I can do to
verify connectivity between the nodes. I'm not sure if Amazon disables icmp
on their VMs, but I cannot ping one node from another.

On Tue, May 24, 2011 at 11:57 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Is this is the log of one node. The first node will say that it
received no ping responses, and then when starting the second node you
should see the ping messages (assuming it managed to connect properly). Here
is a gist of a simple two node cluster I started with unicast discovery:
https://gist.github.com/990298. The gist has logging for discovery set to
TRACE.

Can you gist your settings again? I set discovery.zen.ping_timeout and
I see it being honored (tested on 0.16).

-shay.banon
On Wednesday, May 25, 2011 at 4:42 AM, James Cook wrote:

Does the ping timeout still have meaning when performing EC2
discovery? I set it to 5m and it still did not find the other node.

01:19:44,065 DEBUG thread-2 search.discovery.ec2: 70 -
[Victor von Doom] using dynamic discovery nodes
[[#cloud-i-c79142a9-0]

[inet[/10.193.135.143:9310]],
[#cloud-i-c59142ab-0][inet[/10.207.53.60:9310]]]

01:19:59,067 DEBUG thread-1 search.discovery.ec2: 70 -
[Victor von Doom] ping responses: {none}
01:19:59,083 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] processing [zen-disco-join (elected_as_master)]:
execute

01:19:59,083 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] cluster state updated, version [1], source
[zen-disco-join

(elected_as_master)]
01:19:59,085 INFO thread-1 arch.cluster.service: 78 -
[Victor von Doom] new_master [Victor von
Doom][X-nwlRCRSKufTPT7wMQdGw]

[inet[/10.193.135.143:9310]], reason: zen-disco-join
(elected_as_master)

01:19:59,088 DEBUG thread-1 search.river.cluster: 70 -
[Victor von Doom] processing [reroute_rivers_node_changed]: execute
01:19:59,088 INFO main sticsearch.discovery: 78 -
[Victor von Doom] elasticsearch-prod/X-nwlRCRSKufTPT7wMQdGw
01:19:59,088 DEBUG thread-1 search.river.cluster: 70 -
[Victor von Doom] processing [reroute_rivers_node_changed]: no change
in cluster_state

01:19:59,090 DEBUG thread-1 ticsearch.gateway.s3: 70 -
[Victor von Doom] reading state from gateway
org.elasticsearch.gateway.shared.SharedStorageGateway$1@21a722ef ...
01:19:59,090 DEBUG thread-1 arch.cluster.service: 70 -
[Victor von Doom] processing [zen-disco-join (elected_as_master)]:
done applying updated cluster_state

You can see the EC2 discovery did its job and found the IP address of
the other node, and there was a 15 sec delay before ES reported no ping
responses.

I have the security group for my EC2 nodes opening all ports between
9310 and 9360. (ES nodes are using 9310 - 9312, and Hazelcast uses port

If I ssh into one of my nodes, is there anything I can do (telnet?)
to make sure my other node is reachable? It is getting a bit frustrating at
this point that I can't get these nodes to see each other.

Jim Cook
jcook@tracermedia.com

tracermedia interactive
780 King Ave. #106
Columbus, OH 43212

phone: (614) 298-0774
fax: (614) 298-0776
cell: (234) 738-2492

On Mon, May 23, 2011 at 6:57 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

I don't see a log where it connected to the other node. Are you
sure there isn't firewall or something between them?

CAn you try and increase the ping timeout? Set
discovery.zen.ping_timeout to something like 5m (5 minutes, just to see what
happens).

On Saturday, May 21, 2011 at 7:38 PM, James Cook wrote:

Here is the full log dump:
https://gist.github.com/984657

You can see that the EC2 discovery finds the other instance, but
not much indication as to why it doesn't join the cluster, at least that I
can see.

[Hela] {elasticsearch/0.16.0}[1332]: starting ...
Using the autodetected NIO constraint level: 0
[Hela] Bound to address [/10.86.241.201:9310]
[Hela] bound_address {inet[/10.86.241.201:9310]}, publish_address
{inet[/10.86.241.201:9310]}

Checking for connections, idleTimeout: 1305995206846
HttpConnectionManager.getConnection: config =
HostConfiguration[host=http://ec2.amazonaws.com], timeout = 0

Allocating new connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]

Open connection to ec2.amazonaws.com:80
Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]

Adding connection at: 1305995236962
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes
[[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]],
[#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]

[Hela] Connected to node
[[Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]]]

Checking for connections, idleTimeout: 1305995209972
HttpConnectionManager.getConnection: config =
HostConfiguration[host=http://ec2.amazonaws.com], timeout = 0

Getting free connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]

Adding Host request header
Request body sent
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
Releasing connection back to connection manager.
Freeing connection, hostConfig=HostConfiguration[host=
http://ec2.amazonaws.com]

Adding connection at: 1305995240050
Notifying no-one, there are no waiting threads
[Hela] using dynamic discovery nodes
[[#cloud-i-7b5da315-0][inet[/10.86.241.201:9310]],
[#cloud-i-795da317-0][inet[/10.86.253.155:9310]]]

[Hela] ping responses: {none}
[Hela] processing [zen-disco-join (elected_as_master)]: execute
[Hela] cluster state updated, version [1], source [zen-disco-join
(elected_as_master)]

[Hela] new_master
[Hela][Y9RtLp-iQjuQMgriI3Af_w][inet[/10.86.241.201:9310]], reason:
zen-disco-join (elected_as_master)

[Hela] processing [reroute_rivers_node_changed]: execute
[Hela] processing [reroute_rivers_node_changed]: no change in
cluster_state

The two instances are spun up using Amazon Elastic Beanstalk, so
the ES servers embedded in each VM have the exact same configuration, so
cluster name is identical.

On Sat, May 21, 2011 at 10:10 AM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

You can set discovery.ec2 to TRACE level logging and maybe that
can shed some light as to why they can't find each other.

On Saturday, May 21, 2011 at 5:31 AM, James Cook wrote:

Thanks Paul,

Did you need to use the tags for discovery, or were you just
using them for the function they provide?

It's interesting how many clustering technologies are built
which rely on multicast or IP lists for discovery. We are using Hazelcast as
a memcache layer and we let Elastic Search perform its discovery process,
then use the cluster state to identify the other nodes in the cluster. Next
I use this list of IPs to bootstrap Hazelcast. It would be nice of these
other products pick up on ES's EC2 discovery process.

Although, it certainly seems like something in my environment
or configuration is causing the current EC2 discovery to fail.

Here is my config:

Starting the Elastic Search server node with these settings:
cloud.aws.access_key : AKIAJD5OZUDKRQ3DVURA
cloud.aws.secret_key :
cluster.name : elasticsearch-prod
discovery.type : ec2
gateway.s3.bucket : ppkc-es-gateway-prod
gateway.type : s3
http.enabled : true
http.port : 9311
index.mapping._id.indexed : true
index.store.fs.memory.direct : true
index.store.type : niofs
name : Supernalia
network.host : eth0:ipv4
node.data : true
path.conf : /usr/share/tomcat6/webapps/ROOT//WEB-INF/config
path.data : /var/local/es/data
path.data_with_cluster :
/var/local/es/data/elasticsearch-prod

path.home : /var/local/es
path.logs : /var/local/es/logs
path.work : /var/local/es/work
path.work_with_cluster :
/var/local/es/work/elasticsearch-prod

transport.tcp.port : 9310

On Fri, May 20, 2011 at 1:37 PM, Paul Loy <
keteracel@gmail.com> wrote:

this is what we used to have (that worked):

cloud:
aws:
access_key: XXX
secret_key: XXX

network:
host: <private_ipv4>

discovery:
ec2:
tag.deployment: dev

Then you need a tag of deployment=dev on each ec2 instance
you provision that you want to be a part of this cluster.

Although I now just use Zen as I have to gather a list of
seed hosts for Cassandra, GridGain and HornetQ anyway.

On Fri, May 20, 2011 at 3:09 PM, James Cook <
jcook@tracermedia.com> wrote:

Can someone share a working EC2 configuration?

I haven't been able to get discovery to work, although
the S3 gateway seems to be working.

Thanks,
jim

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(system) #13