Received response for a request that has timed out Issue & [o.e.c.c.C.CoordinatorPublication] [Master Node Name] after [30s] publication of cluster state version [number] is still waiting for {Data Node Name}

ankitag79 · July 8, 2020, 1:31pm

Hi,

Obviously, this error indicates cluster overload but this has become more frequent after adding more data nodes to cluster and separating out master only nodes. Also the throughput of the cluster has reduced.

There is hardly any search request as of now in this cluster.

Here is the configuration details.

ES version - 7.5.2
Data Nodes - 75+
Master Only Nodes - 3
Data/Ingest Nodes - 3
Total Nodes - 3

Data Details.
Logstash - 25+ Intances
Index Count - 1
Shards - 80
Replica - 1 (1primary + 1 replica)
ILM used to rollover after 1TB.
About 12-14 rollovers a day.
So daily 12-14 indexes of 1TB is created.

There is no firewall or network latency between nodes of the cluster. They are in same DC and physically almost side by side.

Log of master node is full of below errors.

Received response for a request that has timed out, sent [11608ms] ago, timed out [1603ms] ago, action [internal:coordination/fault_detection/follower_check], node ............................

[o.e.c.c.C.CoordinatorPublication] [Master Node Name] after [30s] publication of cluster state version [number] is still waiting for {Data Node Name}..........................., xpack.installed=true} [SENT_APPLY_COMMIT], {Data Node name}{............... xpack.installed=true} [SENT_APPLY_COMMIT]

As a result of above error nodes are sometimes removed and added back when 3 consecutive follower_check or leader fails.

Please let me know if any other information is needed for above issue.

Thanks,
Ankit.

system · August 5, 2020, 1:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
One datanode stalled will cause master fail Elasticsearch	0	952	August 22, 2019
Elasticsearch - a scale problem? master timeout? Elasticsearch	3	4753	May 17, 2020
Data Node Error [Timed out when collecting data error] Elasticsearch	0	879	January 28, 2020
A node "Received response for a request that has timed out, sent [24704ms] ago, timed out [9704ms] ago, action [cluster: monitor / nodes / stats [n]]," stuck entire cluster Elasticsearch	2	2771	April 19, 2017
Cat API stuck if some data nodes are overloaded Elasticsearch	5	940	December 11, 2019

Received response for a request that has timed out Issue & [o.e.c.c.C.CoordinatorPublication] [Master Node Name] after [30s] publication of cluster state version [number] is still waiting for {Data Node Name}

Related topics