Elasticsearch - EC2 Region/Availability zone problem

Hi there,

Cross-posted this as a question on the Apacheserver.net forums - not sure
what the best way to get a little assistance with this issue.

We've got a bit of a problem with our elasticsearch environment on
AWSin the US-West-1 region. As far as we can tell, what's happening
is that
the cloud-aws plugin is treating availability zones as distinct regions,
instead of including the AZs we are specifying.

elasticsearch.yml contains the following specific settings:
network.bind_host: eth0:ipv4
#cloud.aws.region: us-west-1
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.node.auto_attributes: true
discovery.type: ec2
discovery.any_group: true
discovery.ec2.tag.stage: production
discovery.ec2.availability_zones: us-west-1a,us-west-1c
gateway.type: s3
gateway.s3.bucket:
gateway.s3.concurrent_streams: 5
gateway.expected_nodes: 3
gateway.recover_after_time: 1m
gateway.recover_after_nodes: 1
indices.recovery.max_size_per_sec: 0
indices.recovery.concurrent_streams: 5
cluster.routing.allocation.node_concurrent_recoveries: 2
discovery.zen.ping_timeout: 30s

cloud.aws.region is commented out because when it isn't commented out,
starting elasticsearch gives an error as follows and then crashes out:
[2012-01-03 18:42:24,333][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
[2012-01-03 18:42:24,344][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
{0.18.6}: Initialization Failed ...

  1. AmazonClientException[Unable to execute HTTP request: null]
    ProtocolException[Received redirect response HTTP/1.1 301 Moved Permanently
    but no location header]

With the above custom config, when starting up either node (both in
different AZs) we get this in the logs on each:

[2012-01-03 19:06:14,454][INFO ][cluster.service ] [Bulldozer] new_master
[Bulldozer][4QTO0FKfQkGiUOgGHncu_w][inet[/10.169.49.245:9300]]{aws_availability_zone=us-west-1a},
reason: zen-disco-join (elected_as_master)

[2012-01-03 19:06:23,301][INFO ][cluster.service ] [Mad Dog Rassitano]
new_master [Mad Dog
Rassitano][vPKkND4UTJ6o245oHL8efQ][inet[/10.171.49.112:9300]]{aws_availability_zone=us-west-1c},
reason: zen-disco-join (elected_as_master)

We only want one to be a master for the cluster - they simply won't cluster
at all.

Can anyone point out what we are doing wrong here?
Thanks in advance,
James.

Can you set in the logging: discovery: TRACE? You will have information on
the list of machines the AWS API gave back and what goes on.

Regarding setting the cloud.aws.region, thats strange.... Can you set
bootstrap: TRACE in the logging, start it up, and gist the logging file
output (not console)?

On Wed, Jan 4, 2012 at 4:59 AM, James Chisholm james@someones.com wrote:

Hi there,

Cross-posted this as a question on the Apacheserver.net forums - not sure
what the best way to get a little assistance with this issue.

We've got a bit of a problem with our elasticsearch environment on AWSin the US-West-1 region. As far as we can tell, what's happening is that
the cloud-aws plugin is treating availability zones as distinct
regions, instead of including the AZs we are specifying.

elasticsearch.yml contains the following specific settings:
network.bind_host: eth0:ipv4
#cloud.aws.region: us-west-1
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.node.auto_attributes: true
discovery.type: ec2
discovery.any_group: true
discovery.ec2.tag.stage: production
discovery.ec2.availability_zones: us-west-1a,us-west-1c
gateway.type: s3
gateway.s3.bucket:
gateway.s3.concurrent_streams: 5
gateway.expected_nodes: 3
gateway.recover_after_time: 1m
gateway.recover_after_nodes: 1
indices.recovery.max_size_per_sec: 0
indices.recovery.concurrent_streams: 5
cluster.routing.allocation.node_concurrent_recoveries: 2
discovery.zen.ping_timeout: 30s

cloud.aws.region is commented out because when it isn't commented out,
starting elasticsearch gives an error as follows and then crashes out:
[2012-01-03 18:42:24,333][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
[2012-01-03 18:42:24,344][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
{0.18.6}: Initialization Failed ...

  1. AmazonClientException[Unable to execute HTTP request: null]
    ProtocolException[Received redirect response HTTP/1.1 301 Moved
    Permanently but no location header]

With the above custom config, when starting up either node (both in
different AZs) we get this in the logs on each:

[2012-01-03 19:06:14,454][INFO ][cluster.service ] [Bulldozer] new_master
[Bulldozer][4QTO0FKfQkGiUOgGHncu_w][inet[/10.169.49.245:9300]]{aws_availability_zone=us-west-1a},
reason: zen-disco-join (elected_as_master)

[2012-01-03 19:06:23,301][INFO ][cluster.service ] [Mad Dog Rassitano]
new_master [Mad Dog
Rassitano][vPKkND4UTJ6o245oHL8efQ][inet[/10.171.49.112:9300]]{aws_availability_zone=us-west-1c},
reason: zen-disco-join (elected_as_master)

We only want one to be a master for the cluster - they simply won't
cluster at all.

Can anyone point out what we are doing wrong here?
Thanks in advance,
James.

Hi Shay, Thanks for your help. I'm unfamiliar with Gist's but i've
attempted to set one up at https://gist.github.com/1562422

It has logs from two ec2 instances after changing the discovery
logging to trace, and a log from what happens after I uncomment the
cloud.aws.region setting. Cheers.
James.

On Jan 4, 10:43 pm, Shay Banon kim...@gmail.com wrote:

Can you set in the logging: discovery: TRACE? You will have information on
the list of machines the AWS API gave back and what goes on.

Regarding setting the cloud.aws.region, thats strange.... Can you set
bootstrap: TRACE in the logging, start it up, and gist the logging file
output (not console)?

On Wed, Jan 4, 2012 at 4:59 AM, James Chisholm ja...@someones.com wrote:

Hi there,

Cross-posted this as a question on the Apacheserver.net forums - not sure
what the best way to get a little assistance with this issue.

We've got a bit of a problem with our elasticsearch environment on AWSin the US-West-1 region. As far as we can tell, what's happening is that
the cloud-aws plugin is treating availability zones as distinct
regions, instead of including the AZs we are specifying.

elasticsearch.yml contains the following specific settings:
network.bind_host: eth0:ipv4
#cloud.aws.region: us-west-1
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.node.auto_attributes: true
discovery.type: ec2
discovery.any_group: true
discovery.ec2.tag.stage: production
discovery.ec2.availability_zones: us-west-1a,us-west-1c
gateway.type: s3
gateway.s3.bucket:
gateway.s3.concurrent_streams: 5
gateway.expected_nodes: 3
gateway.recover_after_time: 1m
gateway.recover_after_nodes: 1
indices.recovery.max_size_per_sec: 0
indices.recovery.concurrent_streams: 5
cluster.routing.allocation.node_concurrent_recoveries: 2
discovery.zen.ping_timeout: 30s

cloud.aws.region is commented out because when it isn't commented out,
starting elasticsearch gives an error as follows and then crashes out:
[2012-01-03 18:42:24,333][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
[2012-01-03 18:42:24,344][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
{0.18.6}: Initialization Failed ...

  1. AmazonClientException[Unable to execute HTTP request: null]
    ProtocolException[Received redirect response HTTP/1.1 301 Moved
    Permanently but no location header]

With the above custom config, when starting up either node (both in
different AZs) we get this in the logs on each:

[2012-01-03 19:06:14,454][INFO ][cluster.service ] [Bulldozer] new_master
[Bulldozer][4QTO0FKfQkGiUOgGHncu_w][inet[/10.169.49.245:9300]]{aws_availability_zone=us-west-1a},
reason: zen-disco-join (elected_as_master)

[2012-01-03 19:06:23,301][INFO ][cluster.service ] [Mad Dog Rassitano]
new_master [Mad Dog
Rassitano][vPKkND4UTJ6o245oHL8efQ][inet[/10.171.49.112:9300]]{aws_availability_zone=us-west-1c},
reason: zen-disco-join (elected_as_master)

We only want one to be a master for the cluster - they simply won't
cluster at all.

Can anyone point out what we are doing wrong here?
Thanks in advance,
James.

Just trying to provide as much information as possible. Will add the
full log to the gist.
Have been working on this for most of the day. I have made one change
to the elasticsearch.yml file - changed the security group to the name
of the group, not the group id - that then gave some more information.
However in the logs I'm seeing "filtering out reservation r-9cf0faf7
based on groups [default], does not include all of [Elasticsearch]"
but this reservation ID does not belong to any of the instances in our
AWS account. There's a second entry which also doesn't match.

Both servers still try to elect themselves as master, and don't
cluster.

Also, based on the times that you are posting, I guess you're in a
timezone around 12hrs behind the one I'm in - I'll try to get online
around 11pm tonight and see if I can get somewhere with it.

Cheers,
James.

On Jan 5, 9:05 am, James Chisholm ja...@someones.com wrote:

Hi Shay, Thanks for your help. I'm unfamiliar with Gist's but i've
attempted to set one up athttps://gist.github.com/1562422

It has logs from two ec2 instances after changing the discovery
logging to trace, and a log from what happens after I uncomment the
cloud.aws.region setting. Cheers.
James.

On Jan 4, 10:43 pm, Shay Banon kim...@gmail.com wrote:

Can you set in the logging: discovery: TRACE? You will have information on
the list of machines the AWS API gave back and what goes on.

Regarding setting the cloud.aws.region, thats strange.... Can you set
bootstrap: TRACE in the logging, start it up, and gist the logging file
output (not console)?

On Wed, Jan 4, 2012 at 4:59 AM, James Chisholm ja...@someones.com wrote:

Hi there,

Cross-posted this as a question on the Apacheserver.net forums - not sure
what the best way to get a little assistance with this issue.

We've got a bit of a problem with our elasticsearch environment on AWSin the US-West-1 region. As far as we can tell, what's happening is that
the cloud-aws plugin is treating availability zones as distinct
regions, instead of including the AZs we are specifying.

elasticsearch.yml contains the following specific settings:
network.bind_host: eth0:ipv4
#cloud.aws.region: us-west-1
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.node.auto_attributes: true
discovery.type: ec2
discovery.any_group: true
discovery.ec2.tag.stage: production
discovery.ec2.availability_zones: us-west-1a,us-west-1c
gateway.type: s3
gateway.s3.bucket:
gateway.s3.concurrent_streams: 5
gateway.expected_nodes: 3
gateway.recover_after_time: 1m
gateway.recover_after_nodes: 1
indices.recovery.max_size_per_sec: 0
indices.recovery.concurrent_streams: 5
cluster.routing.allocation.node_concurrent_recoveries: 2
discovery.zen.ping_timeout: 30s

cloud.aws.region is commented out because when it isn't commented out,
starting elasticsearch gives an error as follows and then crashes out:
[2012-01-03 18:42:24,333][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
[2012-01-03 18:42:24,344][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
{0.18.6}: Initialization Failed ...

  1. AmazonClientException[Unable to execute HTTP request: null]
    ProtocolException[Received redirect response HTTP/1.1 301 Moved
    Permanently but no location header]

With the above custom config, when starting up either node (both in
different AZs) we get this in the logs on each:

[2012-01-03 19:06:14,454][INFO ][cluster.service ] [Bulldozer] new_master
[Bulldozer][4QTO0FKfQkGiUOgGHncu_w][inet[/10.169.49.245:9300]]{aws_availability_zone=us-west-1a},
reason: zen-disco-join (elected_as_master)

[2012-01-03 19:06:23,301][INFO ][cluster.service ] [Mad Dog Rassitano]
new_master [Mad Dog
Rassitano][vPKkND4UTJ6o245oHL8efQ][inet[/10.171.49.112:9300]]{aws_availability_zone=us-west-1c},
reason: zen-disco-join (elected_as_master)

We only want one to be a master for the cluster - they simply won't
cluster at all.

Can anyone point out what we are doing wrong here?
Thanks in advance,
James.

It seems like some nodes are filtered out because of security groups set,
resulting in no nodes to be used when trying to do the discovery...

On Thu, Jan 5, 2012 at 12:05 AM, James Chisholm james@someones.com wrote:

Hi Shay, Thanks for your help. I'm unfamiliar with Gist's but i've
attempted to set one up at https://gist.github.com/1562422

It has logs from two ec2 instances after changing the discovery
logging to trace, and a log from what happens after I uncomment the
cloud.aws.region setting. Cheers.
James.

On Jan 4, 10:43 pm, Shay Banon kim...@gmail.com wrote:

Can you set in the logging: discovery: TRACE? You will have information
on
the list of machines the AWS API gave back and what goes on.

Regarding setting the cloud.aws.region, thats strange.... Can you set
bootstrap: TRACE in the logging, start it up, and gist the logging file
output (not console)?

On Wed, Jan 4, 2012 at 4:59 AM, James Chisholm ja...@someones.com
wrote:

Hi there,

Cross-posted this as a question on the Apacheserver.net forums - not
sure
what the best way to get a little assistance with this issue.

We've got a bit of a problem with our elasticsearch environment on
AWSin the US-West-1 region. As far as we can tell, what's happening is
that
the cloud-aws plugin is treating availability zones as distinct
regions, instead of including the AZs we are specifying.

elasticsearch.yml contains the following specific settings:
network.bind_host: eth0:ipv4
#cloud.aws.region: us-west-1
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.node.auto_attributes: true
discovery.type: ec2
discovery.any_group: true
discovery.ec2.tag.stage: production
discovery.ec2.availability_zones: us-west-1a,us-west-1c
gateway.type: s3
gateway.s3.bucket:
gateway.s3.concurrent_streams: 5
gateway.expected_nodes: 3
gateway.recover_after_time: 1m
gateway.recover_after_nodes: 1
indices.recovery.max_size_per_sec: 0
indices.recovery.concurrent_streams: 5
cluster.routing.allocation.node_concurrent_recoveries: 2
discovery.zen.ping_timeout: 30s

cloud.aws.region is commented out because when it isn't commented
out,
starting elasticsearch gives an error as follows and then crashes
out:
[2012-01-03 18:42:24,333][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
[2012-01-03 18:42:24,344][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
{0.18.6}: Initialization Failed ...

  1. AmazonClientException[Unable to execute HTTP request: null]
    ProtocolException[Received redirect response HTTP/1.1 301 Moved
    Permanently but no location header]

With the above custom config, when starting up either node (both in
different AZs) we get this in the logs on each:

[2012-01-03 19:06:14,454][INFO ][cluster.service ] [Bulldozer]
new_master
[Bulldozer][4QTO0FKfQkGiUOgGHncu_w][inet[/10.169.49.245:9300
]]{aws_availability_zone=us-west-1a},
reason: zen-disco-join (elected_as_master)

[2012-01-03 19:06:23,301][INFO ][cluster.service ] [Mad Dog Rassitano]
new_master [Mad Dog
Rassitano][vPKkND4UTJ6o245oHL8efQ][inet[/10.171.49.112:9300
]]{aws_availability_zone=us-west-1c},
reason: zen-disco-join (elected_as_master)

We only want one to be a master for the cluster - they simply won't
cluster at all.

Can anyone point out what we are doing wrong here?
Thanks in advance,
James.

Hi Shay,

As mentioned in GTalk, all the security groups are set correctly, and
in the Elasticsearch security group in US-West, all instances can
communicate on a variety of ports between themselves.

Thanks for your suggestion of using tags, it's given me enough further
information to say that I think I've worked out what's happening.
Adding the tag "stage: production" to the instances changed the
logging to display the instance ids, which I then discovered were the
instance ids for two unused instances in US East.

Remember how we're getting an error when we specify the
cloud.aws.region setting? Because of this HTTP 301 "Moved Permanently"
error when the cloud.aws.region is set to us-west-1, Elasticsearch is
defaulting to US-East-1 as it's region. Elasticsearch is then doing
it's discovery and excluding the two instances in our account in US-
East-1 (which are not even running) in the default security group.
It's not even getting ot the US-West security groups we have
specified. Even though the AZs are specified by the config as us-
west-1a and 1c, it's only doing discovery in US-East region and not
finding anything. This only seems to be a problem when specifying us-
west-1 - as a test, I changed the cloud.aws.region value to us-east-1
and it didn't get a HTTP 301 error. Set it us-west-2 and I received
the 301 error. I haven't tried doing the other regions.

Further information - our instances for this project are all in US-
West-1 (across 1a and 1c availability zones). Each of these instance
types has it's own security group specified and access control is
configured through each of these.

We need to work out what the HTTP call being made to AWS is when
cloud.aws.region is set to us-west-1 and why it's failing (could be
something amazon has changed), fix it and then it should be able to
cluster in the US-West region. Sadly I'm just a lowly sysadmin - so
forking code to fix it is not exactly my speciality. Happy to help in
anyway that I can to work out why this is occurring.

Unfortunately the timezone difference is going to mess with our heads
somewhat - I'll try to have a nap this afternoon and hopefully be able
to stay up late tonight. Hopefully then we'll be able to get this
cluster going!

Cheers!
James

On Jan 6, 12:40 am, Shay Banon kim...@gmail.com wrote:

It seems like some nodes are filtered out because of security groups set,
resulting in no nodes to be used when trying to do the discovery...

On Thu, Jan 5, 2012 at 12:05 AM, James Chisholm ja...@someones.com wrote:

Hi Shay, Thanks for your help. I'm unfamiliar with Gist's but i've
attempted to set one up athttps://gist.github.com/1562422

It has logs from two ec2 instances after changing the discovery
logging to trace, and a log from what happens after I uncomment the
cloud.aws.region setting. Cheers.
James.

On Jan 4, 10:43 pm, Shay Banon kim...@gmail.com wrote:

Can you set in the logging: discovery: TRACE? You will have information
on
the list of machines the AWS API gave back and what goes on.

Regarding setting the cloud.aws.region, thats strange.... Can you set
bootstrap: TRACE in the logging, start it up, and gist the logging file
output (not console)?

On Wed, Jan 4, 2012 at 4:59 AM, James Chisholm ja...@someones.com
wrote:

Hi there,

Cross-posted this as a question on the Apacheserver.net forums - not
sure
what the best way to get a little assistance with this issue.

We've got a bit of a problem with our elasticsearch environment on
AWSin the US-West-1 region. As far as we can tell, what's happening is
that
the cloud-aws plugin is treating availability zones as distinct
regions, instead of including the AZs we are specifying.

elasticsearch.yml contains the following specific settings:
network.bind_host: eth0:ipv4
#cloud.aws.region: us-west-1
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.node.auto_attributes: true
discovery.type: ec2
discovery.any_group: true
discovery.ec2.tag.stage: production
discovery.ec2.availability_zones: us-west-1a,us-west-1c
gateway.type: s3
gateway.s3.bucket:
gateway.s3.concurrent_streams: 5
gateway.expected_nodes: 3
gateway.recover_after_time: 1m
gateway.recover_after_nodes: 1
indices.recovery.max_size_per_sec: 0
indices.recovery.concurrent_streams: 5
cluster.routing.allocation.node_concurrent_recoveries: 2
discovery.zen.ping_timeout: 30s

cloud.aws.region is commented out because when it isn't commented
out,
starting elasticsearch gives an error as follows and then crashes
out:
[2012-01-03 18:42:24,333][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
[2012-01-03 18:42:24,344][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
{0.18.6}: Initialization Failed ...

  1. AmazonClientException[Unable to execute HTTP request: null]
    ProtocolException[Received redirect response HTTP/1.1 301 Moved
    Permanently but no location header]

With the above custom config, when starting up either node (both in
different AZs) we get this in the logs on each:

[2012-01-03 19:06:14,454][INFO ][cluster.service ] [Bulldozer]
new_master
[Bulldozer][4QTO0FKfQkGiUOgGHncu_w][inet[/10.169.49.245:9300
]]{aws_availability_zone=us-west-1a},
reason: zen-disco-join (elected_as_master)

[2012-01-03 19:06:23,301][INFO ][cluster.service ] [Mad Dog Rassitano]
new_master [Mad Dog
Rassitano][vPKkND4UTJ6o245oHL8efQ][inet[/10.171.49.112:9300
]]{aws_availability_zone=us-west-1c},
reason: zen-disco-join (elected_as_master)

We only want one to be a master for the cluster - they simply won't
cluster at all.

Can anyone point out what we are doing wrong here?
Thanks in advance,
James.

As discussed with Shay, this is now resolved.

The issue was to do with the setting cloud.aws.region causing the us-
west-1 region. We are unsure as to why this was occurring, but by
setting the following we were able to overcome the problem and resolve
our clustering issue. I would be interested to know if anyone else has
this problem in future.

Added settings to config:
cloud.aws.s3.endpoint: s3.amazonaws.com
cloud.aws.ec2.endpoint: ec2.us-west-1.amazonaws.com
gateway.s3.region: us-west-1

Thanks again Shay :slight_smile:
Cheers,
James.

On Jan 6, 12:12 pm, James Chisholm ja...@someones.com wrote:

Hi Shay,

As mentioned in GTalk, all the security groups are set correctly, and
in the Elasticsearch security group in US-West, all instances can
communicate on a variety of ports between themselves.

Thanks for your suggestion of using tags, it's given me enough further
information to say that I think I've worked out what's happening.
Adding the tag "stage: production" to the instances changed the
logging to display the instance ids, which I then discovered were the
instance ids for two unused instances in US East.

Remember how we're getting an error when we specify the
cloud.aws.region setting? Because of this HTTP 301 "Moved Permanently"
error when the cloud.aws.region is set to us-west-1, Elasticsearch is
defaulting to US-East-1 as it's region. Elasticsearch is then doing
it's discovery and excluding the two instances in our account in US-
East-1 (which are not even running) in the default security group.
It's not even getting ot the US-West security groups we have
specified. Even though the AZs are specified by the config as us-
west-1a and 1c, it's only doing discovery in US-East region and not
finding anything. This only seems to be a problem when specifying us-
west-1 - as a test, I changed the cloud.aws.region value to us-east-1
and it didn't get a HTTP 301 error. Set it us-west-2 and I received
the 301 error. I haven't tried doing the other regions.

Further information - our instances for this project are all in US-
West-1 (across 1a and 1c availability zones). Each of these instance
types has it's own security group specified and access control is
configured through each of these.

We need to work out what the HTTP call being made to AWS is when
cloud.aws.region is set to us-west-1 and why it's failing (could be
something amazon has changed), fix it and then it should be able to
cluster in the US-West region. Sadly I'm just a lowly sysadmin - so
forking code to fix it is not exactly my speciality. Happy to help in
anyway that I can to work out why this is occurring.

Unfortunately the timezone difference is going to mess with our heads
somewhat - I'll try to have a nap this afternoon and hopefully be able
to stay up late tonight. Hopefully then we'll be able to get this
cluster going!

Cheers!
James

On Jan 6, 12:40 am, Shay Banon kim...@gmail.com wrote:

It seems like some nodes are filtered out because of security groups set,
resulting in no nodes to be used when trying to do the discovery...

On Thu, Jan 5, 2012 at 12:05 AM, James Chisholm ja...@someones.com wrote:

Hi Shay, Thanks for your help. I'm unfamiliar with Gist's but i've
attempted to set one up athttps://gist.github.com/1562422

It has logs from two ec2 instances after changing the discovery
logging to trace, and a log from what happens after I uncomment the
cloud.aws.region setting. Cheers.
James.

On Jan 4, 10:43 pm, Shay Banon kim...@gmail.com wrote:

Can you set in the logging: discovery: TRACE? You will have information
on
the list of machines the AWS API gave back and what goes on.

Regarding setting the cloud.aws.region, thats strange.... Can you set
bootstrap: TRACE in the logging, start it up, and gist the logging file
output (not console)?

On Wed, Jan 4, 2012 at 4:59 AM, James Chisholm ja...@someones.com
wrote:

Hi there,

Cross-posted this as a question on the Apacheserver.net forums - not
sure
what the best way to get a little assistance with this issue.

We've got a bit of a problem with our elasticsearch environment on
AWSin the US-West-1 region. As far as we can tell, what's happening is
that
the cloud-aws plugin is treating availability zones as distinct
regions, instead of including the AZs we are specifying.

elasticsearch.yml contains the following specific settings:
network.bind_host: eth0:ipv4
#cloud.aws.region: us-west-1
cloud.aws.access_key:
cloud.aws.secret_key:
cloud.node.auto_attributes: true
discovery.type: ec2
discovery.any_group: true
discovery.ec2.tag.stage: production
discovery.ec2.availability_zones: us-west-1a,us-west-1c
gateway.type: s3
gateway.s3.bucket:
gateway.s3.concurrent_streams: 5
gateway.expected_nodes: 3
gateway.recover_after_time: 1m
gateway.recover_after_nodes: 1
indices.recovery.max_size_per_sec: 0
indices.recovery.concurrent_streams: 5
cluster.routing.allocation.node_concurrent_recoveries: 2
discovery.zen.ping_timeout: 30s

cloud.aws.region is commented out because when it isn't commented
out,
starting elasticsearch gives an error as follows and then crashes
out:
[2012-01-03 18:42:24,333][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
[2012-01-03 18:42:24,344][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: null
{0.18.6}: Initialization Failed ...

  1. AmazonClientException[Unable to execute HTTP request: null]
    ProtocolException[Received redirect response HTTP/1.1 301 Moved
    Permanently but no location header]

With the above custom config, when starting up either node (both in
different AZs) we get this in the logs on each:

[2012-01-03 19:06:14,454][INFO ][cluster.service ] [Bulldozer]
new_master
[Bulldozer][4QTO0FKfQkGiUOgGHncu_w][inet[/10.169.49.245:9300
]]{aws_availability_zone=us-west-1a},
reason: zen-disco-join (elected_as_master)

[2012-01-03 19:06:23,301][INFO ][cluster.service ] [Mad Dog Rassitano]
new_master [Mad Dog
Rassitano][vPKkND4UTJ6o245oHL8efQ][inet[/10.171.49.112:9300
]]{aws_availability_zone=us-west-1c},
reason: zen-disco-join (elected_as_master)

We only want one to be a master for the cluster - they simply won't
cluster at all.

Can anyone point out what we are doing wrong here?
Thanks in advance,
James.