Dealing with RED status


(James Cook) #1

I have embedded ES in a web application. Each instance of my web application
I bring on line (EC2) has one data node which is intended to cluster with
the data node in other web instances. I am using the S3 gateway for disaster
recovery in the face of total cluster failure, or pushing out a new version
of the software. Note that I don't control when my instances are killed or
instantiated. I am using local storage (niofs), but any local storage is
purely ephemeral. New instances will get a new EBS device and old instance's
EBS volume is discarded when that instance shuts down.

When my web app initializes, I need to make sure the ES cluster is in GREEN
or YELLOW state before accepting web requests. I am trying now to understand
what the process should be if my ES node is in RED status. I currently use
the blocking cluster health check to wait until a non-RED status is
returned:

getClient().admin().cluster().prepareHealth()
.setWaitForYellowStatus()
.setTimeout(_recoveryWait)
.execute().actionGet();

I have a few questions:

  1. Are there any benchmarks for how long I might be waiting for a node to
    join the cluster and achieve at least a YELLOW status? Are we talking
    seconds, minutes, or hours. Assume 1 GB of cluster metadata and indexes.
  2. How can I tell whether my status will always be RED, or if ES is
    actively trying to rectify the problem?
  3. What options do I have if my status is still RED after waiting for a
    period of time?

-- jim


(Shay Banon) #2

On Wednesday, May 18, 2011 at 10:19 PM, James Cook wrote:
I have embedded ES in a web application. Each instance of my web application I bring on line (EC2) has one data node which is intended to cluster with the data node in other web instances. I am using the S3 gateway for disaster recovery in the face of total cluster failure, or pushing out a new version of the software. Note that I don't control when my instances are killed or instantiated. I am using local storage (niofs), but any local storage is purely ephemeral. New instances will get a new EBS device and old instance's EBS volume is discarded when that instance shuts down.

When my web app initializes, I need to make sure the ES cluster is in GREEN or YELLOW state before accepting web requests. I am trying now to understand what the process should be if my ES node is in RED status. I currently use the blocking cluster health check to wait until a non-RED status is returned:

getClient().admin().cluster().prepareHealth()
.setWaitForYellowStatus()
.setTimeout(_recoveryWait)
.execute().actionGet();

I have a few questions:
Are there any benchmarks for how long I might be waiting for a node to join the cluster and achieve at least a YELLOW status? Are we talking seconds, minutes, or hours. Assume 1 GB of cluster metadata and indexes.

It really depends. When using s3 gateway, then ES will try its best to allocate shards to nodes that have the most "similar" index files with what you have on s3. If its a full recovery, then its the time it takes to download the index files from s3.

How can I tell whether my status will always be RED, or if ES is actively trying to rectify the problem?

You can try and use the indices stats API, it gives a bit of information on the progress of the recovery that is going on. In general, you can also time how long it takes to download that amount of expected data from s3, and then use it (factored) as an expected upper limit.

What options do I have if my status is still RED after waiting for a period of time?

Not much. The idea is that this will not happen. If it does happen, then something went wrong and (some) indices could not recover. When using s3 gateway, then something went wrong in getting the data from it, when using local gateway, it means that for some shards, not enough instances could be found to choose from.

-- jim


(James Cook) #3

At some point, I should be able to detect that ES can do nothing to recover
on its own. At this point, I would like to programmatically tell the nodes
to start up "clean", and I will kick off a process which re-indexes data
from a secondary storage location that I am using for a data failsafe
(Amazon SimpleDB). How might I programmatically tell the nodes that I want
to reset the S3 gateway and have them clear out or reinitialize their local
storage?

-- jim

On Wed, May 18, 2011 at 4:22 PM, Shay Banon shay.banon@elasticsearch.comwrote:

On Wednesday, May 18, 2011 at 10:19 PM, James Cook wrote:

I have embedded ES in a web application. Each instance of my web
application I bring on line (EC2) has one data node which is intended to
cluster with the data node in other web instances. I am using the S3 gateway
for disaster recovery in the face of total cluster failure, or pushing out a
new version of the software. Note that I don't control when my instances are
killed or instantiated. I am using local storage (niofs), but any local
storage is purely ephemeral. New instances will get a new EBS device and old
instance's EBS volume is discarded when that instance shuts down.

When my web app initializes, I need to make sure the ES cluster is in GREEN
or YELLOW state before accepting web requests. I am trying now to understand
what the process should be if my ES node is in RED status. I currently use
the blocking cluster health check to wait until a non-RED status is
returned:

getClient().admin().cluster().prepareHealth()
.setWaitForYellowStatus()
.setTimeout(_recoveryWait)
.execute().actionGet();

I have a few questions:

  1. Are there any benchmarks for how long I might be waiting for a node
    to join the cluster and achieve at least a YELLOW status? Are we talking
    seconds, minutes, or hours. Assume 1 GB of cluster metadata and indexes.

It really depends. When using s3 gateway, then ES will try its best to
allocate shards to nodes that have the most "similar" index files with what
you have on s3. If its a full recovery, then its the time it takes to
download the index files from s3.

  1. How can I tell whether my status will always be RED, or if ES is
    actively trying to rectify the problem?

You can try and use the indices stats API, it gives a bit of information on
the progress of the recovery that is going on. In general, you can also time
how long it takes to download that amount of expected data from s3, and then
use it (factored) as an expected upper limit.

  1. What options do I have if my status is still RED after waiting for a
    period of time?

Not much. The idea is that this will not happen. If it does happen, then
something went wrong and (some) indices could not recover. When using s3
gateway, then something went wrong in getting the data from it, when using
local gateway, it means that for some shards, not enough instances could be
found to choose from.

-- jim


(Shay Banon) #4

You can delete the relevant indices and then reindex them.
On Thursday, May 19, 2011 at 4:45 AM, James Cook wrote:

At some point, I should be able to detect that ES can do nothing to recover on its own. At this point, I would like to programmatically tell the nodes to start up "clean", and I will kick off a process which re-indexes data from a secondary storage location that I am using for a data failsafe (Amazon SimpleDB). How might I programmatically tell the nodes that I want to reset the S3 gateway and have them clear out or reinitialize their local storage?

-- jim

On Wed, May 18, 2011 at 4:22 PM, Shay Banon shay.banon@elasticsearch.com wrote:

On Wednesday, May 18, 2011 at 10:19 PM, James Cook wrote:

I have embedded ES in a web application. Each instance of my web application I bring on line (EC2) has one data node which is intended to cluster with the data node in other web instances. I am using the S3 gateway for disaster recovery in the face of total cluster failure, or pushing out a new version of the software. Note that I don't control when my instances are killed or instantiated. I am using local storage (niofs), but any local storage is purely ephemeral. New instances will get a new EBS device and old instance's EBS volume is discarded when that instance shuts down.

When my web app initializes, I need to make sure the ES cluster is in GREEN or YELLOW state before accepting web requests. I am trying now to understand what the process should be if my ES node is in RED status. I currently use the blocking cluster health check to wait until a non-RED status is returned:

getClient().admin().cluster().prepareHealth()
.setWaitForYellowStatus()
.setTimeout(_recoveryWait)
.execute().actionGet();

I have a few questions:
Are there any benchmarks for how long I might be waiting for a node to join the cluster and achieve at least a YELLOW status? Are we talking seconds, minutes, or hours. Assume 1 GB of cluster metadata and indexes.

It really depends. When using s3 gateway, then ES will try its best to allocate shards to nodes that have the most "similar" index files with what you have on s3. If its a full recovery, then its the time it takes to download the index files from s3.
How can I tell whether my status will always be RED, or if ES is actively trying to rectify the problem?

You can try and use the indices stats API, it gives a bit of information on the progress of the recovery that is going on. In general, you can also time how long it takes to download that amount of expected data from s3, and then use it (factored) as an expected upper limit.
What options do I have if my status is still RED after waiting for a period of time?

Not much. The idea is that this will not happen. If it does happen, then something went wrong and (some) indices could not recover. When using s3 gateway, then something went wrong in getting the data from it, when using local gateway, it means that for some shards, not enough instances could be found to choose from.

-- jim


(ShebaSankovich) #5

Hair extensions and hairpieces really can help to improve the looking of people and help to change them a lot hair extensions and hairpieces if choosing the right type and right style of hair wigs for you, of course that you will look and become more and more attractive and also buy cheap hair wigs now is easy as you can just get the ideal style that you want online at online shops now. I have just buy me a black short hair and it is perfect.


(system) #6