Best timeout value for setWaitForYellowStatus

Pradeep_Gowda · April 5, 2016, 6:54pm

Hey Everyone,

When I upgraded my Elasticsearch to 2.3.0 from 2.2.0 and I started seeing slowness in shard allocation. So I added below code in my client to wait till the cluster turns yellow

client.admin().cluster().prepareHealth().setWaitForYellowStatus().setTimeout(TimeValue.timeValueMinutes(1)).execute().actionGet();

I have two questions.

What is the best way to handle this scenario. In 2.3, it took like 5 min to redistribute the shards vs 30 sec in 2.2. By default setWaitForYellowStatus waits for 30s.
How can I increase timeout? It looks like setTimeout is not working and it is falling back to default 30s.

Regards,

danielmitterdorfer · April 12, 2016, 9:16am

Hi,

I suggest you don't rely on timeout but rather poll in a loop. Consider a huge index that would take hours to recover. If you rely on timeout you'd block one of your program threads for hours. The polling approach gives you the option to apply different strategies (e.g. increasing the waiting the period between checks, asking for user input, stopping to poll after a certain timeout etc. etc.). As a rough sketch, I'd do something like this:

while (true) {
  ClusterHealthResponse clusterHealthResponse = client.admin().cluster()
    .prepareHealth()
    .setWaitForYellowStatus()    
    .setTimeout(TimeValue.timeValueMillis(500))
    .execute()
    .actionGet();
  if (clusterHealthResponse.isTimedOut()) {
     //wait for 20 seconds, then retry. You can do lots of fancy 
    // things here - just as I've described above
     Thread.sleep(20 * 1000);
  } else {
    // we've reached yellow status, go on
    break;
  }
}

To your second point: I've set the timeout locally to one minute and measured the time. The request timed out after one minute, just as expected. But as I suggested above I'd rather set a shorter timeout than the default and use polling.

Daniel

Pradeep_Gowda · April 13, 2016, 8:31pm

Hi Dani,
Thank you for your response. When I try to get the elastic health using above, I am getting exception as follows.

Exception in init thread : java.lang.IllegalStateException: ClusterService was close during health call
        at org.elasticsearch.action.admin.cluster.health.TransportClusterHealthAction$3.onClusterServiceClose(TransportClusterHealthAction.java:155) [elasticsearch-2.2.0.jar:2.2.0]
        at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onClose(ClusterStateObserver.java:225) [elasticsearch-2.2.0.jar:2.2.0]
        at org.elasticsearch.cluster.service.InternalClusterService.doStop(InternalClusterService.java:208) [elasticsearch-2.2.0.jar:2.2.0]
        at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:88) [elasticsearch-2.2.0.jar:2.2.0]
        at org.elasticsearch.node.Node.stop(Node.java:300) [elasticsearch-2.2.0.jar:2.2.0]
        at org.elasticsearch.node.Node.close(Node.java:325) [elasticsearch-2.2.0.jar:2.2.0]
        at org.elasticsearch.bootstrap.Bootstrap$4.run(Bootstrap.java:157) [elasticsearch-2.2.0.jar:2.2.0]
{code}

But when I check the elastic log, it was up and the shard allocation was under progress.

danielmitterdorfer · April 14, 2016, 4:30am

Hi,

interesting. How do you connect to Elasticsearch? Do you use the node or the transport client?

Daniel

Pradeep_Gowda · April 14, 2016, 4:44am

I am using Transport Client. This doesn't happen everytime though.

danielmitterdorfer · April 15, 2016, 8:00am

Does this also happen with the default timeout? Because the only thing that has changed was the timeout and that you repeat the calls now on timeout.

Pradeep_Gowda · April 15, 2016, 8:16pm

Yes, It happens on default timeout. What may be the reason? Even though this health check is in the loop, I don't get the status at first call itself. Rather than handling this kind of issue at the client side, I feel it's better to handle at service side.

danielmitterdorfer · April 18, 2016, 7:22am

Hi,

the exception trace indicates that one of your cluster nodes is about to shutdown (and others can still be up). Do you have multiple nodes in your cluster? Did you check the logs on all of them?

If you mean by "service side", that it should be handled by Elasticsearch: It is perfectly fine that nodes leave and join the cluster (you may want to take down a node for maintenance).

Daniel

Pradeep_Gowda · April 18, 2016, 7:53am

Yes, I have three nodes in the cluster and look's like all node's were up.

danielmitterdorfer · April 18, 2016, 8:47am

Is there any exception trace in the log that indicates a problem? Can you correlate the log events on the cluster's nodes with the problem on client side?

Topic		Replies	Views
Set timeout Elasticsearch	17	25753	February 5, 2020
Questions about the timeout search option Elasticsearch	1	389	July 6, 2017
Changing timeout value for org.elasticsearch.action.UnavailableShardsException Elasticsearch	2	321	July 6, 2017
Client.setTimeout(millis) does not seem to work ( unlike ActionFuture.actionGet(millis) ) Elasticsearch	1	571	July 6, 2017
Request timeout after 30000ms - should I set a higher value? Elasticsearch	2	24413	October 24, 2018

Best timeout value for setWaitForYellowStatus

Related topics