EC2 discovery leads to two masters


(Pavel Penchev) #1

Hi,

I'm having some troubles configuring ES in the cloud. Most of the time
everything works, but sometimes the discovery fails and I endup with two
masters using the same cluster name.
The situation happens on roughly 1 out of 10 startups.

I'm using 0.17.4 embedded, the configuration looks like this

cluster:
name: default-cluster-name

index:
number_of_shards: 2
number_of_replicas: 1

discovery:
type: ec2
zen:
minimum_master_nodes: 1

cloud:
aws:
access_key: XXXXXXXXXX
secret_key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Trace logs can be found here https://gist.github.com/1134288. Any ideas
what am I missing?

Thanks in advance,
Pavel


(Shay Banon) #2

It seems like the two nodes ended up not seeing each other properly, thus
each elected itself as the master. If you increase the ping_timeout (it
defaults to 3s) then it should go away. Set discovery.zen.ping.timeout to
something like 10s or 20s.

On Tue, Aug 9, 2011 at 6:19 PM, Pavel Penchev pavel.penchev@gmail.comwrote:

Hi,

I'm having some troubles configuring ES in the cloud. Most of the time
everything works, but sometimes the discovery fails and I endup with two
masters using the same cluster name.
The situation happens on roughly 1 out of 10 startups.

I'm using 0.17.4 embedded, the configuration looks like this

cluster:
name: default-cluster-name

index:
number_of_shards: 2
number_of_replicas: 1

discovery:
type: ec2
zen:
minimum_master_nodes: 1

cloud:
aws:
access_key: XXXXXXXXXX
secret_key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Trace logs can be found here https://gist.github.com/1134288. Any ideas
what am I missing?

Thanks in advance,
Pavel


(jjasinek) #3

Shay,

If two nodes did participate in a network partition (even on a local
network), and thus end up self-promoting each other to a master
status, what happens when they see each other again?

Jason

On Aug 9, 1:22 pm, Shay Banon kim...@gmail.com wrote:

It seems like the two nodes ended up not seeing each other properly, thus
each elected itself as the master. If you increase the ping_timeout (it
defaults to 3s) then it should go away. Set discovery.zen.ping.timeout to
something like 10s or 20s.

On Tue, Aug 9, 2011 at 6:19 PM, Pavel Penchev pavel.penc...@gmail.comwrote:

Hi,

I'm having some troubles configuring ES in the cloud. Most of the time
everything works, but sometimes the discovery fails and I endup with two
masters using the same cluster name.
The situation happens on roughly 1 out of 10 startups.

I'm using 0.17.4 embedded, the configuration looks like this

cluster:
name: default-cluster-name

index:
number_of_shards: 2
number_of_replicas: 1

discovery:
type: ec2
zen:
minimum_master_nodes: 1

cloud:
aws:
access_key: XXXXXXXXXX
secret_key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Trace logs can be found herehttps://gist.github.com/1134288. Any ideas
what am I missing?

Thanks in advance,
Pavel


(Shay Banon) #4

Nothing, they will remain partitioned, and you will need to decide which one
to restart. The minimum_master_nodes is there to help reduce chances of it
happening. (on a 2 node cluster though, this setting does not mean much).

On Tue, Aug 9, 2011 at 11:06 PM, jjasinek jjasinek@gmail.com wrote:

Shay,

If two nodes did participate in a network partition (even on a local
network), and thus end up self-promoting each other to a master
status, what happens when they see each other again?

Jason

On Aug 9, 1:22 pm, Shay Banon kim...@gmail.com wrote:

It seems like the two nodes ended up not seeing each other properly, thus
each elected itself as the master. If you increase the ping_timeout (it
defaults to 3s) then it should go away. Set discovery.zen.ping.timeout to
something like 10s or 20s.

On Tue, Aug 9, 2011 at 6:19 PM, Pavel Penchev <pavel.penc...@gmail.com
wrote:

Hi,

I'm having some troubles configuring ES in the cloud. Most of the time
everything works, but sometimes the discovery fails and I endup with
two

masters using the same cluster name.
The situation happens on roughly 1 out of 10 startups.

I'm using 0.17.4 embedded, the configuration looks like this

cluster:
name: default-cluster-name

index:
number_of_shards: 2
number_of_replicas: 1

discovery:
type: ec2
zen:
minimum_master_nodes: 1

cloud:
aws:
access_key: XXXXXXXXXX
secret_key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Trace logs can be found herehttps://gist.github.com/1134288. Any ideas
what am I missing?

Thanks in advance,
Pavel


(Pavel Penchev) #5

Hi,

sorry to bump an old thread, for completeness I just want to confirm
that setting discovery.zen.ping_timeout to 15s works like a charm in my
case.

Many thanks for the quick response,
Pavel

On 9.08.2011 21:22, Shay Banon wrote:

It seems like the two nodes ended up not seeing each other properly,
thus each elected itself as the master. If you increase the
ping_timeout (it defaults to 3s) then it should go away.
Set discovery.zen.ping.timeout to something like 10s or 20s.

On Tue, Aug 9, 2011 at 6:19 PM, Pavel Penchev <pavel.penchev@gmail.com
mailto:pavel.penchev@gmail.com> wrote:

Hi,

I'm having some troubles configuring ES in the cloud. Most of the
time everything works, but sometimes the discovery fails and I
endup with two masters using the same cluster name.
The situation happens on roughly 1 out of 10 startups.

I'm using 0.17.4 embedded, the configuration looks like this
------------------------------------------------------------
cluster:
    name: default-cluster-name

index:
    number_of_shards: 2
    number_of_replicas: 1

discovery:
    type: ec2
    zen:
        minimum_master_nodes: 1

cloud:
    aws:
        access_key: XXXXXXXXXX
        secret_key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


Trace logs can be found here https://gist.github.com/1134288. Any
ideas what am I missing?

Thanks in advance,
Pavel

(James Cook) #6

Hi Pavel, that timeout value will often increase based on the number of
non-cluster nodes you have under EC2 management. At least that has been my
experience.

A trick to keep it working well is to make sure that all the nodes that are
in your ElasticSearch cluster are part of the same EC2 group. Then use the ES
groups settinghttp://www.elasticsearch.org/guide/reference/modules/discovery/ec2.htmlto limit those nodes that ES looks for to establish membership.


(Shay Banon) #7

Two notes on that: You can use ec2 tags as well to filter down the list of
instances needed to be pinged, and, in 0.17, the unicast discovery is
considerably more lightweight compared to previous versions.

On Tue, Aug 16, 2011 at 10:17 PM, James Cook jcook@tracermedia.com wrote:

Hi Pavel, that timeout value will often increase based on the number of
non-cluster nodes you have under EC2 management. At least that has been my
experience.

A trick to keep it working well is to make sure that all the nodes that are
in your ElasticSearch cluster are part of the same EC2 group. Then use the ES
groups settinghttp://www.elasticsearch.org/guide/reference/modules/discovery/ec2.htmlto limit those nodes that ES looks for to establish membership.


(Pavel Penchev) #8

Thanks James, we'll make use of the setting. Indeed the production EC2
environment is quite heterogeneous.

Pavel

On 16.08.2011 22:17, James Cook wrote:

Hi Pavel, that timeout value will often increase based on the number
of non-cluster nodes you have under EC2 management. At least that has
been my experience.

A trick to keep it working well is to make sure that all the nodes
that are in your ElasticSearch cluster are part of the same EC2 group.
Then use the ES groups setting
http://www.elasticsearch.org/guide/reference/modules/discovery/ec2.html
to limit those nodes that ES looks for to establish membership.


(Clinton Gormley) #9

Hi James (or anybody else with similar experience)

On Tue, 2011-08-16 at 12:17 -0700, James Cook wrote:

Hi Pavel, that timeout value will often increase based on the number
of non-cluster nodes you have under EC2 management. At least that has
been my experience.

A trick to keep it working well is to make sure that all the nodes
that are in your ElasticSearch cluster are part of the same EC2 group.
Then use the ES groups setting to limit those nodes that ES looks for
to establish membership.

Given that getting ES to work well under EC2 seems to present a bit of a
challenge, how would you feel about writing a tutorial for
elasticsearch.org?

It would be an invaluable resource.

clint


(James Cook) #10

I think that would be useful as well. I'll try to carve out some time to get
something started.


(James Cook) #11

And Clinton, a cookbook of search recipes would be awesome to see on a web
page. :slight_smile:

You have solved many gotchas for people over the past months.


(Clinton Gormley) #12

On Fri, 2011-08-19 at 06:01 -0700, James Cook wrote:

And Clinton, a cookbook of search recipes would be awesome to see on a
web page. :slight_smile:

touché :wink:


(system) #13