I have embedded ES in a web application. Each instance of my web application
I bring on line (EC2) has one data node which is intended to cluster with
the data node in other web instances. I am using the S3 gateway for disaster
recovery in the face of total cluster failure, or pushing out a new version
of the software. Note that I don't control when my instances are killed or
instantiated. I am using local storage (niofs), but any local storage is
purely ephemeral. New instances will get a new EBS device and old instance's
EBS volume is discarded when that instance shuts down.
When my web app initializes, I need to make sure the ES cluster is in GREEN
or YELLOW state before accepting web requests. I am trying now to understand
what the process should be if my ES node is in RED status. I currently use
the blocking cluster health check to wait until a non-RED status is
returned:
getClient().admin().cluster().prepareHealth()
.setWaitForYellowStatus()
.setTimeout(_recoveryWait)
.execute().actionGet();
I have a few questions:
- Are there any benchmarks for how long I might be waiting for a node to
join the cluster and achieve at least a YELLOW status? Are we talking
seconds, minutes, or hours. Assume 1 GB of cluster metadata and indexes. - How can I tell whether my status will always be RED, or if ES is
actively trying to rectify the problem? - What options do I have if my status is still RED after waiting for a
period of time?
-- jim