Optimal discovery.zen.ping_timeout for on-prem cluster over 10gb network?

tmccall · March 26, 2020, 5:29pm

We have a 7-node cluster. 3 dedicated master nodes, 4 data/ingest nodes. We had an issue where a host that was housing our VMs failed, which caused us to lose one master node and two data nodes. Our cluster was still "available," but we decided to flip over to our rescue cluster.

However, it took a long time to resolve this issue (around 5 minutes of search downtime on our site) because our ping timeout was too long. We had it set to the default 30s, but that caused us to not identify this issue for a while. We have since lowered it to 10s, but we're wondering if (given our network capacity and cluster size) we could go even lower?

Thanks!

system · April 23, 2020, 5:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster failures Elasticsearch	1	320	July 19, 2013
Discovery.ec2.ping_timeout Elasticsearch	3	361	January 6, 2013
Data node left the cluster due to `master_left - failed to ping, tried [3] times, each with maximum [30s] timeout]` Elasticsearch	5	5669	January 7, 2019
Long period of querying failure during node timeout Elasticsearch	3	1177	April 17, 2020
Any negative impact of a long discovery.zen.ping_timeout setting Elasticsearch	0	436	June 5, 2014

Optimal discovery.zen.ping_timeout for on-prem cluster over 10gb network?

Related topics