Yea... . I was actually thinking about this way back when health API was
introduced, and considered treating it as RED status, but I think its a
different level of "status".
On Tuesday, June 14, 2011 at 9:47 PM, James Cook wrote:
So the best practice is to treat MasterNotFoundException as the same thing
as a RED status and continue looping (with some delay) until I get at least
a YELLOW status?
On Tue, Jun 14, 2011 at 2:21 PM, Shay Banon email@example.com:
The health API needs to find hte master of the cluster in order to get
it. If its not available, then you will get the exception. You should treat
it, based on your logic, similar to a RED status.
On Tuesday, June 14, 2011 at 9:01 PM, James Cook wrote:
When I bootstrap ES (my code https://gist.github.com/977580), I perform
- Retrieve the current health status
- If the status is RED, I log this fact and wait for at least YELLOW
- Check the current health status
- Log the "final" status
- If it is still RED, I throw an exception
Since I do not see my log statement (#3) indicating the RED status, I
assume that it is the call in #2 which is the operation to which you allude.
What best practice should I be following to ensure the cluster is up and
ready to receive calls?
On Tue, Jun 14, 2011 at 1:30 PM, Shay Banon firstname.lastname@example.org:
What operation do you do when you create the node? It seems like it tries
to do an operation, and because the ping_timeout is longer, it will not be
able to perform it because the discovery is not done yet.
On Monday, June 13, 2011 at 10:12 PM, James Cook wrote:
I had tried that earlier. Both nodes throw a MasterNotFoundException. Here
are the gists in that case:
On Mon, Jun 13, 2011 at 2:28 PM, Shay Banon email@example.com:
It seems like the second node, which detected the first node on the first
round of ping and identified correctly that it should become master, and
then went into a second round of pings to wait for it (the first node) to
become master, failed to get a proper response from it on the second round.
My guess is that the concurrency problems in making a connection to a
node have a play in that (there are non elasticsearch nodes in the pool as
well). This was fixed in master, but, you can work around it by specifying a
longer ping timeout. This should work. I see that the logs indicate 3 second
ping timeout, can you try and increate it by setting
discovery.ec2.ping_timeout to something like 30s?
On Monday, June 13, 2011 at 8:23 PM, James Cook wrote:
Sure, thanks for taking a look.
Node 1: https://gist.github.com/1023225
Node 2: https://gist.github.com/1023232
About two pages into each gist, you will see the ES configuration
parameters I am using. Both servers are deploying the exact same WAR file
with an Embedded ES server.
On Sun, Jun 12, 2011 at 8:07 PM, Shay Banon firstname.lastname@example.org:
Can you set teh discovery logging level to TRACE and gist the logs of
On Monday, June 13, 2011 at 2:56 AM, James Cook wrote:
When I increase the ping timeout, I end up with both nodes reporting
MasterNotFoundExcpetions. If I start the nodes up sequentially, I have no
problems. Perhaps there is a race condition at play?
This is 0.16.2.
Sent from my iPad
TOn Jun 12, 2011, at 3:21 AM, Shay Banon email@example.com
The option that you have here is to increase the ping timeout in this
case. Its ok not to get a response from another node while its starting up,
and there is a window where they wait for nodes to start up.
You can set discovery.ec2.ping_timeout to a higher value (defaults to 3s).
On Friday, June 10, 2011 at 9:03 PM, James Cook wrote:
My problem may be caused because both nodes boot up approximately at the
same time. If I start one node, then wait a few minutes before starting my
other node, they cluster.
Unfortunately, I do not have a lot of control over how Elastic Beanstalk
decides to start my servers.
Any ideas what I can do to work around?
On Fri, Jun 10, 2011 at 1:02 PM, James Cook < firstname.lastname@example.org
A little background, if needed.
[Sikorsky] Connected to node
[Sikorsky]  connecting to [#cloud-i-59503637-0][inet[/10.86.201.157:9310]],
[Sikorsky]  received response from
[Sikorsky] Disconnected from
[Sikorsky] received ping response with no matching id 
The EC2 discovery process correctly finds my other EC2 node (10.86.201.157)
using the EC2 DescribeInstances API. My primary node then:
- Connects to 10.86.201.157
- Receives a response from 10.86.201.157
- Promptly disconnects from 10.86.201.157
Is that message "received ping response with no matching id " the
reason for the disconnect? If so, what does "id" refer to?
On Fri, Jun 10, 2011 at 11:34 AM, James Cook < email@example.com
I'm experiencing a new problem with EC2 clustering using 0.16.0.
I am starting two nodes on EC2, and during the discovery process I see this
My two nodes do not discover each other and fail to cluster. Ports are
open, and the configuration is identical between the nodes:
cloud.aws.access_key : AKIAJWQRTNTMFXIMX3WA
cloud.aws.secret_key : <HIDDEN>
cluster.name : elasticsearch-dev
discovery.type : ec2
discovery.zen.ping_timeout : 30s
gateway.s3.bucket : ppkc-es-gateway-dev
gateway.type : s3
http.enabled : true
http.port : 9311
index.mapping._id.indexed : true
index.store.type : niofs
name : Sikorsky
network.host : 0.0.0.0
node.data : true
path.data : /var/local/es/data
transport.tcp.port : 9310