Ec2 discovery stopped working!

Since yesterday, new client-only nodes I bring up can no longer discover
existing nodes on different machines ("waited for 30s and no initial state
was set by the discovery").

What's odd is that I haven't changed anything in my setup (elasticsearch
0.90.2, elasticsearch-cloud-aws 1.12.0, S3 gateway), and the machines are
in the same security group and can see each other (i.e. I can connect to
port 9200).

I've seen this problem previously, but only sporadically.

If I bring up a 2nd data node, rather than just a client node, the 1st data
node appears to get stuck in a "auto expanded replicas" loop and eventually
runs out of memory...

2013-08-01 10:47:41,620 [INFO] cluster.metadata - [Blindside] updating
number_of_replicas to [1] for indices [xyz]
2013-08-01 10:47:41,640 [INFO] cluster.metadata - [Blindside] [xyz] auto
expanded replicas to [1]
...

I tried setting "discovery.ec2.ping_timeout" to "15s", but that doesn't
make any difference. Other ideas?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Should add that I can telnet from the client machine to port 9300 on the
machine running elasticsearch.

What else could I check?

On Thursday, August 1, 2013 2:23:12 PM UTC-7, Eric Jain wrote:

Since yesterday, new client-only nodes I bring up can no longer discover
existing nodes on different machines ("waited for 30s and no initial state
was set by the discovery").

What's odd is that I haven't changed anything in my setup (elasticsearch
0.90.2, elasticsearch-cloud-aws 1.12.0, S3 gateway), and the machines are
in the same security group and can see each other (i.e. I can connect to
port 9200).

I've seen this problem previously, but only sporadically.

If I bring up a 2nd data node, rather than just a client node, the 1st
data node appears to get stuck in a "auto expanded replicas" loop and
eventually runs out of memory...

2013-08-01 10:47:41,620 [INFO] cluster.metadata - [Blindside] updating
number_of_replicas to [1] for indices [xyz]
2013-08-01 10:47:41,640 [INFO] cluster.metadata - [Blindside] [xyz] auto
expanded replicas to [1]
...

I tried setting "discovery.ec2.ping_timeout" to "15s", but that doesn't
make any difference. Other ideas?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reposted here:

On Monday, August 5, 2013 4:43:06 PM UTC-7, Eric Jain wrote:

Should add that I can telnet from the client machine to port 9300 on the
machine running elasticsearch.

What else could I check?

On Thursday, August 1, 2013 2:23:12 PM UTC-7, Eric Jain wrote:

Since yesterday, new client-only nodes I bring up can no longer discover
existing nodes on different machines ("waited for 30s and no initial state
was set by the discovery").

What's odd is that I haven't changed anything in my setup (elasticsearch
0.90.2, elasticsearch-cloud-aws 1.12.0, S3 gateway), and the machines are
in the same security group and can see each other (i.e. I can connect to
port 9200).

I've seen this problem previously, but only sporadically.

If I bring up a 2nd data node, rather than just a client node, the 1st
data node appears to get stuck in a "auto expanded replicas" loop and
eventually runs out of memory...

2013-08-01 10:47:41,620 [INFO] cluster.metadata - [Blindside] updating
number_of_replicas to [1] for indices [xyz]
2013-08-01 10:47:41,640 [INFO] cluster.metadata - [Blindside] [xyz] auto
expanded replicas to [1]
...

I tried setting "discovery.ec2.ping_timeout" to "15s", but that doesn't
make any difference. Other ideas?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Well, if your node in not in the same C address IP (for example, master:
10.1.1.1, node: 10.1.2.1), you need to specify the params like this:
'discovery.zen.ping.unicast.hosts: ["10.19.1.1"]
'

在 2013年8月2日星期五UTC+8上午5时23分12秒,Eric Jain写道:

Since yesterday, new client-only nodes I bring up can no longer discover
existing nodes on different machines ("waited for 30s and no initial state
was set by the discovery").

What's odd is that I haven't changed anything in my setup (elasticsearch
0.90.2, elasticsearch-cloud-aws 1.12.0, S3 gateway), and the machines are
in the same security group and can see each other (i.e. I can connect to
port 9200).

I've seen this problem previously, but only sporadically.

If I bring up a 2nd data node, rather than just a client node, the 1st
data node appears to get stuck in a "auto expanded replicas" loop and
eventually runs out of memory...

2013-08-01 10:47:41,620 [INFO] cluster.metadata - [Blindside] updating
number_of_replicas to [1] for indices [xyz]
2013-08-01 10:47:41,640 [INFO] cluster.metadata - [Blindside] [xyz] auto
expanded replicas to [1]
...

I tried setting "discovery.ec2.ping_timeout" to "15s", but that doesn't
make any difference. Other ideas?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Mon, Aug 12, 2013 at 1:27 AM, 姚仁捷 baniu.yao@gmail.com wrote:

Well, if your node in not in the same C address IP (for example, master:
10.1.1.1, node: 10.1.2.1), you need to specify the params like this:
'discovery.zen.ping.unicast.hosts: ["10.19.1.1"]
'

Right now I have a healthy cluster with nodes in different A networks
(e.g. 23.20.43.x and 54.221.47.x), so I don't think the
'discovery.zen.ping.unicast.hosts' parameter is required when using
the cloud-aws plugin. Other ideas?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Just ran into this again (elasticsearch 0.90.3, elasticsearch-cloud-aws
1.14.0): New nodes ignore existing nodes. The only solution appears to be
to shut down the entire cluster first :frowning:

Looking at the (TRACE-level) logs, the old nodes do seem to be discovered
at first, but then the new node elects itself as master,
http://localhost:9200/_cluster/health reports a single data node, and
shards are restored from S3!

Any ideas (other than don't use the ec2 discovery mechanism)?

On Thursday, August 1, 2013 2:23:12 PM UTC-7, Eric Jain wrote:

Since yesterday, new client-only nodes I bring up can no longer discover
existing nodes on different machines ("waited for 30s and no initial state
was set by the discovery").

What's odd is that I haven't changed anything in my setup (elasticsearch
0.90.2, elasticsearch-cloud-aws 1.12.0, S3 gateway), and the machines are
in the same security group and can see each other (i.e. I can connect to
port 9200).

I've seen this problem previously, but only sporadically.

If I bring up a 2nd data node, rather than just a client node, the 1st
data node appears to get stuck in a "auto expanded replicas" loop and
eventually runs out of memory...

2013-08-01 10:47:41,620 [INFO] cluster.metadata - [Blindside] updating
number_of_replicas to [1] for indices [xyz]
2013-08-01 10:47:41,640 [INFO] cluster.metadata - [Blindside] [xyz] auto
expanded replicas to [1]
...

I tried setting "discovery.ec2.ping_timeout" to "15s", but that doesn't
make any difference. Other ideas?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Thursday, October 3, 2013 1:00:37 AM UTC-7, Eric Jain wrote:

Just ran into this again (elasticsearch 0.90.3, elasticsearch-cloud-aws
1.14.0): New nodes ignore existing nodes. The only solution appears to be
to shut down the entire cluster first :frowning:

For the record, this problem resurfaced again after a few weeks (now using
elasticsearch 0.19.7 and elasticsearch-cloud-aws 1.16.0)--I still don't
know what the cause is.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd290e8e-b1f6-4921-9668-7aeadc5af074%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reading the docs it says that aws plugin 1.16.0 is for elasticsearch 0.90.4
and higher. I would not expect it to run with 0.19.7.

Greets
Andrej

Am Mittwoch, 4. Dezember 2013 03:56:38 UTC+1 schrieb Eric Jain:

On Thursday, October 3, 2013 1:00:37 AM UTC-7, Eric Jain wrote:

Just ran into this again (elasticsearch 0.90.3, elasticsearch-cloud-aws
1.14.0): New nodes ignore existing nodes. The only solution appears to be
to shut down the entire cluster first :frowning:

For the record, this problem resurfaced again after a few weeks (now using
elasticsearch 0.19.7 and elasticsearch-cloud-aws 1.16.0)--I still don't
know what the cause is.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c542cd53-6f1d-46ab-9ff3-0d5ce15c6c7e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I think Eric is using 0.90.7 and not 0.19.7…

:slight_smile:

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 4 décembre 2013 at 12:49:11, Andrej Rosenheinrich (andrej.rosenheinrich@unister.de) a écrit:

Reading the docs it says that aws plugin 1.16.0 is for elasticsearch 0.90.4 and higher. I would not expect it to run with 0.19.7.

Greets
Andrej

Am Mittwoch, 4. Dezember 2013 03:56:38 UTC+1 schrieb Eric Jain:
On Thursday, October 3, 2013 1:00:37 AM UTC-7, Eric Jain wrote:
Just ran into this again (elasticsearch 0.90.3, elasticsearch-cloud-aws 1.14.0): New nodes ignore existing nodes. The only solution appears to be to shut down the entire cluster first :frowning:

For the record, this problem resurfaced again after a few weeks (now using elasticsearch 0.19.7 and elasticsearch-cloud-aws 1.16.0)--I still don't know what the cause is.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c542cd53-6f1d-46ab-9ff3-0d5ce15c6c7e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.529f3821.4f4ef005.bd3d%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

On Wed, Dec 4, 2013 at 6:11 AM, David Pilato david@pilato.fr wrote:

I think Eric is using 0.90.7 and not 0.19.7…

Yes, sorry for the confusion; I wish the problem was that simple :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHte5%2B%2BapdgscPomGTnwSAAXTb_w-eJeeebXTxMEoaQkcu90%2Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Tuesday, December 3, 2013 6:56:38 PM UTC-8, Eric Jain wrote:

For the record, this problem resurfaced again after a few weeks (now using
elasticsearch 0.19.7 and elasticsearch-cloud-aws 1.16.0)--I still don't
know what the cause is.

Here is the log file (leading up to the the
"org.elasticsearch.discovery.MasterNotDiscoveredException: waited for
[30s]" exception) :

application.log · GitHub

There are two other machines running, both with a client-only and a data
node, and there's nothing obviously wrong with the cluster:

curl -XGET 'http://10.10.209.204:9200/_cluster/health?pretty=true'
{
"cluster_name" : "prod-39",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 2,
"active_primary_shards" : 226,
"active_shards" : 452,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

But as mentioned previously, the only way to recover from this situation is
to shut down all nodes (or copy the data and start a new cluster).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7cd0c53a-0fa8-456a-9b3e-f4609d3eb3db%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It might be worth upgrading to 0.90.X, from what I have seen there was some
major improvements in discovery.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 December 2013 15:07, Eric Jain eric.jain@gmail.com wrote:

On Tuesday, December 3, 2013 6:56:38 PM UTC-8, Eric Jain wrote:

For the record, this problem resurfaced again after a few weeks (now
using elasticsearch 0.19.7 and elasticsearch-cloud-aws 1.16.0)--I still
don't know what the cause is.

Here is the log file (leading up to the the
"org.elasticsearch.discovery.MasterNotDiscoveredException: waited for
[30s]" exception) :

application.log · GitHub

There are two other machines running, both with a client-only and a data
node, and there's nothing obviously wrong with the cluster:

curl -XGET 'http://10.10.209.204:9200/_cluster/health?pretty=true'
{
"cluster_name" : "prod-39",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 2,
"active_primary_shards" : 226,
"active_shards" : 452,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

But as mentioned previously, the only way to recover from this situation
is to shut down all nodes (or copy the data and start a new cluster).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7cd0c53a-0fa8-456a-9b3e-f4609d3eb3db%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bG_Y%3DBVzAh7quxi3Jk5SOYVBhFJk4Go103x22nR1a8kw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Wed, Dec 4, 2013 at 8:09 PM, Mark Walkom markw@campaignmonitor.com wrote:

It might be worth upgrading to 0.90.X, from what I have seen there was some
major improvements in discovery.

As mentioned above, I am in fact using the latest (production) version
of both elasticsearch (0.90.7) and the elasticsearch-cloud-aws plugin
(1.16.0).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHte5%2BJ-vOhu%3DDpyt3F9d_tnQGiLisKjmL-NsXcRpDwDnV5Mhg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ah, the quote with "now using elasticsearch 0.19.7" threw me.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 December 2013 15:14, Eric Jain eric.jain@gmail.com wrote:

On Wed, Dec 4, 2013 at 8:09 PM, Mark Walkom markw@campaignmonitor.com
wrote:

It might be worth upgrading to 0.90.X, from what I have seen there was
some
major improvements in discovery.

As mentioned above, I am in fact using the latest (production) version
of both elasticsearch (0.90.7) and the elasticsearch-cloud-aws plugin
(1.16.0).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHte5%2BJ-vOhu%3DDpyt3F9d_tnQGiLisKjmL-NsXcRpDwDnV5Mhg%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YFfqvMz39WOrnxLBEB8c3CVzhRdt-s5AUE1579rhHogg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Wed, Dec 4, 2013 at 8:16 PM, Mark Walkom markw@campaignmonitor.com wrote:

Ah, the quote with "now using elasticsearch 0.19.7" threw me.

Yeah, I shouldn't have quoted my own typo again :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHte5%2BKn9a9vdL33v-vRU9jWbj5ZzyHma8%3DiYh53kQKLaJQ0RQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.