TimeTaken by a Cluster State Update Task


Last week we had a update in our application which uses node client to communicate Es Cluster.
Though we have 15 node clients , we will update one by one). Application update finished in all 15 machines with-in ten minutes. But after restart , no one joins cluster upto 40 mins.

On looking into masters log , all the cluster state update tasks took more than 3 mins .

  • Example:

cluster update task [zen-disco-node_failed([esClient_MachineIp] [NodeName] ... ,reason transport disconnected] took 3.7m above the warn threshold of 30s

I have the following settings in elasticserch.yml
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 5

Can you explain this a bit more, what is happening here? What do you expect? It's just not really clear to me :slight_smile:

Thanks for the reply .

We have 3 group of machines in our application .

  1. Console grid - End User use to search logs (3 machines with node client)
  2. Indexer grid - Used to index logs to es (6 machines with node client)
  3. Monitor grid - Used to trigger scheduled searches according to result we alert users ( no of 500 status in last 15 mins , No of exceptions occurred in last 15 mins etc ..) (6 machines with node client)

All 3 kind of grid can refer mysql , In our application we had a schema change due to that we must restart all 3 grids.
(restart happens one by one). Once the restart completes, it took nearly 40 mins for each node to join into es cluster. Thus End-User unable to search for 40 mins also we unable to trigger a alert in right time, and indexing delayed by 40m.

That seems unusual. Are they all in the same DC? Was there networking issues?
Did you check the logs of the cluster master?

Hope this clears.

zen-disco-node_failed / node join cluster update tasks combinely took 40 mins for 15 node clients .

yes all are in same DC . On looking master logs all the pending tasks took more than 3m. There is no network issues.

Will the following properties affects node disconnection ( 60s * 5 - 5mins )?

discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 5

Yep they will.

Is there any reason why you increased those from the defaults?

Yes long back we have faced long gc run issues in few data nodes.

If i remember correctly few nodes suffered by long gc runs (took 2+ mins) so we decided to increase node gc survival time from default (93 secs (ping_timeout-30 secs , interval 1s and retry 3) to 5 mins.

If i understand correctly master marks datanode as dead for long gc runs i.e. greater than 93 secs (Am i right ?)
In this case unnecessary shard movement will be triggered which in-turn affects regular indexing/Searching activity.

Thats why we increased those values. Is it possible to have different values for data node and client node ?

That makes some sense then, are you still getting long GCs?

Not recently . But any time it is possible since our our jvm options are as follows, Xmx and Xms values are 31GB and -Xmn is 14GB.

I'd suggest you drop heap to 30.5GB or even 30GB. Above 30.5 you hit the compressed pointers issue. Also don't touch any other settings for the JVM.

You really want some kind of monitoring here, what is happening to your nodes and the cluster, to give you better correlation.

I went through the following link and gave 31G as heap.


Thanks for the suggestion Mark , I will reduce jvm heap to 30G . I forget to mention about the following property

**-XX:CMSInitiatingOccupancyFraction=78 ** (java 1.7.0_55). I hope this wont be a problem for long gc.