Nodes constantly losing connection

borna_talebi · January 27, 2021, 7:42am

I have a 2-node cluster(archive and main) and today they start losing connection.
These are the two logs that keep appearing:
[2021-01-27T10:26:55,816][WARN ][o.e.g.PersistedClusterStateService] [XXXarchive.XXX.local] writing cluster state took [69043ms] which is above the warn threshold of [10s]; wrote global metadata [false] and metadata for [124] indices and skipped [127] unchanged indices [2021-01-27T10:28:11,586][WARN ][o.e.c.c.C.CoordinatorPublication] [XXX-main.XXX.local] after [30s] publication of cluster state version [95968] is still waiting for {XXXarchive.XXX.local}{cPA9OW7KQhKGQ-_xtXGnhg}{y-JIm7z-QWO8NhP2fdwTBQ}{XXX-archive.XXX.local}{X.X.X.103:9300}{dilmrt}{ml.machine_memory=8201244672, ml.max_open_jobs=20, xpack.installed=true, box_type=warm, transform.node=true} [SENT_PUBLISH_REQUEST]
I have around 140 indices and 160 shards.

Christian_Dahlqvist · January 27, 2021, 7:46am

It looks like your cluster is overloaded and/or has far too slow storage. What is the specification of your cluster? What type of storage are you using?

borna_talebi · January 27, 2021, 9:05am

I'm using a SAN HDD on main node and an NFS disk on the second(archive) node. I know they're pretty slow but I've never had this problem in the past 3 months.

Here are the putput of iostat on both nodes:
main:

archive:

Could the high iowait percentage be a problem?

Christian_Dahlqvist · January 27, 2021, 10:30am

That does not mean that it is not a problem now.

Yes. It looks like this node is struggling, at least periodically.

system · February 24, 2021, 10:30am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
My cluster keep dropping node and changing master "failed to writeCluster State" Elasticsearch	9	2551	March 17, 2022
Elasticsearch nodes continually disconneting/reconnecting. Resulting in very high number of unassigned shards Elasticsearch	18	3070	September 3, 2020
Nodes being dropped from cluster Elasticsearch	8	1243	March 5, 2021
Cluster broken after 7.2 -> 7.4 upgrade Elasticsearch	15	5263	December 6, 2019
Nodes leaving Elastic Search cluster Elasticsearch	1	423	November 2, 2020

Nodes constantly losing connection

Related topics