Last week we had a update in our application which uses node client to communicate Es Cluster.
Though we have 15 node clients , we will update one by one). Application update finished in all 15 machines with-in ten minutes. But after restart , no one joins cluster upto 40 mins.
On looking into masters log , all the cluster state update tasks took more than 3 mins .
Example:
cluster update task [zen-disco-node_failed([esClient_MachineIp] [NodeName] ... ,reason transport disconnected] took 3.7m above the warn threshold of 30s
I have the following settings in elasticserch.yml
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 5
Console grid - End User use to search logs (3 machines with node client)
Indexer grid - Used to index logs to es (6 machines with node client)
Monitor grid - Used to trigger scheduled searches according to result we alert users ( no of 500 status in last 15 mins , No of exceptions occurred in last 15 mins etc ..) (6 machines with node client)
All 3 kind of grid can refer mysql , In our application we had a schema change due to that we must restart all 3 grids.
(restart happens one by one). Once the restart completes, it took nearly 40 mins for each node to join into es cluster. Thus End-User unable to search for 40 mins also we unable to trigger a alert in right time, and indexing delayed by 40m.
Yes long back we have faced long gc run issues in few data nodes.
If i remember correctly few nodes suffered by long gc runs (took 2+ mins) so we decided to increase node gc survival time from default (93 secs (ping_timeout-30 secs , interval 1s and retry 3) to 5 mins.
If i understand correctly master marks datanode as dead for long gc runs i.e. greater than 93 secs (Am i right ?)
In this case unnecessary shard movement will be triggered which in-turn affects regular indexing/Searching activity.
Thats why we increased those values. Is it possible to have different values for data node and client node ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.