Publication of cluster state fails - followers check retry count exceeded

No, that doesn't explain the data node restarting. Normally a departed node would just try and rejoin the cluster again.

Ok, thanks. Currently, I don't have any relevant logs from the restarting data nodes but I'll try to get it.

Meanwhile (shooting in the dark here) - I noticed that every index gets "_tier_preference": "data_content" in it settings (without me defined it). if I don't specify hot/warm/cold architecture (not time based indices) - should I set it to hot upon index creation? does it make an impact? (or in a bad way) should I defined the data-wrk nodes as hot? will they get a different configuration values?

data_content is appropriate if you don't have a hot/warm/cold architecture. I also think this doesn't explain why a node might restart. The tier preference just defines how the data is allocated to nodes in the cluster.

I see nothing interesting in the data node, just a start of a new serving.
I'll try to change the log level to DEBUG and see if I can catch something interesting

That suggests it's being killed with extreme prejudice, most likely a SIGKILL. Essentially every other way a node can shut down will leave ample evidence in the logs. Could anything be sending it a SIGKILL? The OOM killer perhaps?

I looked and the kubernetes events which store both OOM and SIGKILL and nothing there as well.
I will share additional info when I find something.
Thanks again!

Does that include the kernel logs (i.e. dmesg)? That's where you get reports from the OOM killer.

I don't have info from the dmesg yet, but after changing the log level to DEBUG I see tons (15 per seconds) of logs like:

{"@timestamp":"2023-05-06T19:15:58.333Z", "log.level":"DEBUG", "message":"no retention leases are expired from current retention leases [RetentionLeases{primaryTerm=1, ver
sion=8, leases={peer_recovery/XdOe8IG6RiGritEnRkqnhg=RetentionLease{id='peer_recovery/XdOe8IG6RiGritEnRkqnhg', retainingSequenceNumber=25, timestamp=1683384058087, source=
'peer recovery'}, peer_recovery/l6kLyFgFQnehymKgDCT7xA=RetentionLease{id='peer_recovery/l6kLyFgFQnehymKgDCT7xA', retainingSequenceNumber=25, timestamp=1683384058087, sourc
e='peer recovery'}}}]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-dat
a-srv-0][management][T#4]","log.logger":"org.elasticsearch.index.seqno.ReplicationTracker","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"X
dOe8IG6RiGritEnRkqnhg","elasticsearch.node.name":"elasticsearch-aud-es-data-srv-0","elasticsearch.cluster.name":"elasticsearch-aud","tags":[" [8774046_23-05-06_08-39-27][0
]"]}
{"@timestamp":"2023-05-06T19:15:58.362Z", "log.level":"DEBUG", "message":"no retention leases are expired from current retention leases [RetentionLeases{primaryTerm=2, ver
sion=14, leases={peer_recovery/fCccvICST9mxGsyoZvzAuQ=RetentionLease{id='peer_recovery/fCccvICST9mxGsyoZvzAuQ', retainingSequenceNumber=1, timestamp=1683395188289, source=
'peer recovery'}, peer_recovery/XdOe8IG6RiGritEnRkqnhg=RetentionLease{id='peer_recovery/XdOe8IG6RiGritEnRkqnhg', retainingSequenceNumber=1, timestamp=1683395188289, source
='peer recovery'}}}]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-data
-srv-0][management][T#2]","log.logger":"org.elasticsearch.index.seqno.ReplicationTracker","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"Xd
Oe8IG6RiGritEnRkqnhg","elasticsearch.node.name":"elasticsearch-aud-es-data-srv-0","elasticsearch.cluster.name":"elasticsearch-aud","tags":[" [8781830_23-05-05_23-44-20][0]
"]}

But that's on my data-srv nodes (the ones that holds the baked indices and ready for searching. Those node are more stable (compared to the data-wrk nodes where the indexing happens).
If that's related to ILM or something else which can be disabled (we don't use it) - I would love to know.

Regard the data-wrk nodes, I'm still waiting for a crash/restart, meanwhile I see a lot of :

{"@timestamp":"2023-05-06T19:22:46.950Z", "log.level":"DEBUG", "message":"Index [.security-tokens] is not available - no metadata", "ecs.version": "1.2.0","service.name":"
ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-data-wrk-0][clusterApplierService#updateTask][T#1]","log.logger":"
org.elasticsearch.xpack.security.support.SecurityIndexManager","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"AmP0IcBYRYysT9ejx7CDNA","elas
ticsearch.node.name":"elasticsearch-aud-es-data-wrk-0","elasticsearch.cluster.name":"elasticsearch-aud"}
{"@timestamp":"2023-05-06T19:22:46.950Z", "log.level":"DEBUG", "message":"processing [ApplyCommitRequest{term=33, version=1235230, sourceNode={elasticsearch-aud-es-master-
0}{BnxNI5oEQd2Z0x6VGhwVQQ}{nBxKRCfLSICTjaIZlD_NNQ}{elasticsearch-aud-es-master-0}{10.2.7.62}{10.2.7.62:9300}{m}{8.7.0}{xpack.installed=true, k8s_node_name=ip-10-2-8-171.ec
2.internal}}]: took [0s] done applying updated cluster state (version: 1235230, uuid: eMrMcgrRQHCIEyF9XiukEw)", "ecs.version": "1.2.0","service.name":"ES_ECS","event.datas
et":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-data-wrk-0][clusterApplierService#updateTask][T#1]","log.logger":"org.elasticsearch.cl
uster.service.ClusterApplierService","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"AmP0IcBYRYysT9ejx7CDNA","elasticsearch.node.name":"elas
ticsearch-aud-es-data-wrk-0","elasticsearch.cluster.name":"elasticsearch-aud"}
{"@timestamp":"2023-05-06T19:22:49.079Z", "log.level":"DEBUG", "message":"pre-authorizing child action [indices:data/read/search[phase/query]] of parent action [indices:da
ta/read/search]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-data-wrk-
0][transport_worker][T#4]","log.logger":"org.elasticsearch.xpack.security.authz.PreAuthorizationUtils","trace.id":"31cc6718b33de06a74102da89a685089","elasticsearch.cluster
.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"AmP0IcBYRYysT9ejx7CDNA","elasticsearch.node.name":"elasticsearch-aud-es-data-wrk-0","elasticsearch.cluster.name":"
elasticsearch-aud"}
{"@timestamp":"2023-05-06T19:22:50.368Z", "log.level":"DEBUG", "message":"no retention leases are expired from current retention leases [RetentionLeases{primaryTerm=3, ver
sion=4, leases={peer_recovery/AmP0IcBYRYysT9ejx7CDNA=RetentionLease{id='peer_recovery/AmP0IcBYRYysT9ejx7CDNA', retainingSequenceNumber=0, timestamp=1683390200681, source='
peer recovery'}}}]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-data-w
rk-0][management][T#5]","log.logger":"org.elasticsearch.index.seqno.ReplicationTracker","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"AmP0
IcBYRYysT9ejx7CDNA","elasticsearch.node.name":"elasticsearch-aud-es-data-wrk-0","elasticsearch.cluster.name":"elasticsearch-aud","tags":[" [8776193_23-05-06_09-53-50][5]"]
}
{"@timestamp":"2023-05-06T19:22:51.191Z", "log.level":"DEBUG", "message":"realm [file1] authenticated user [elastic-internal-probe], with roles [[elastic_internal_probe_us
er]] (cached)", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-data-wrk-0]
[transport_worker][T#6]","log.logger":"org.elasticsearch.xpack.security.authc.file.FileRealm","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id"
:"AmP0IcBYRYysT9ejx7CDNA","elasticsearch.node.name":"elasticsearch-aud-es-data-wrk-0","elasticsearch.cluster.name":"elasticsearch-aud"}
{"@timestamp":"2023-05-06T19:22:51.191Z", "log.level":"DEBUG", "message":"Authentication of [elastic-internal-probe] using realm [file/file1] with token [UsernamePasswordT
oken] was [AuthenticationResult{status=SUCCESS, value=User[username=elastic-internal-probe,roles=[elastic_internal_probe_user],fullName=null,email=null,metadata={}], messa
ge=null, exception=null}]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es
-data-wrk-0][transport_worker][T#6]","log.logger":"org.elasticsearch.xpack.security.authc.RealmsAuthenticator","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elast
icsearch.node.id":"AmP0IcBYRYysT9ejx7CDNA","elasticsearch.node.name":"elasticsearch-aud-es-data-wrk-0","elasticsearch.cluster.name":"elasticsearch-aud"}
{"@timestamp":"2023-05-06T19:22:55.925Z", "log.level":"DEBUG", "message":"no retention leases are expired from current retention leases [RetentionLeases{primaryTerm=8, ver
sion=3727, leases={peer_recovery/66-z5KH4RtGhRBtADMORfA=RetentionLease{id='peer_recovery/66-z5KH4RtGhRBtADMORfA', retainingSequenceNumber=10852, timestamp=1683394944689, s
ource='peer recovery'}, peer_recovery/AmP0IcBYRYysT9ejx7CDNA=RetentionLease{id='peer_recovery/AmP0IcBYRYysT9ejx7CDNA', retainingSequenceNumber=10852, timestamp=16834004376
81, source='peer recovery'}}}]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-a
ud-es-data-wrk-0][management][T#2]","log.logger":"org.elasticsearch.index.seqno.ReplicationTracker","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.no
de.id":"AmP0IcBYRYysT9ejx7CDNA","elasticsearch.node.name":"elasticsearch-aud-es-data-wrk-0","elasticsearch.cluster.name":"elasticsearch-aud","tags":[" [.monitoring-kibana-
7-2023.05.04][0]"]}
{"@timestamp":"2023-05-06T19:22:56.297Z", "log.level":"DEBUG", "message":"realm [file1], successful authentication [AuthenticationResult{status=SUCCESS, value=User[usernam
e=elastic-internal-probe,roles=[elastic_internal_probe_user],fullName=null,email=null,metadata={}], message=null, exception=null}] for [elastic-internal-probe]", "ecs.vers
ion": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-data-wrk-0][transport_worker][T#1]",
"log.logger":"org.elasticsearch.xpack.security.authc.file.FileRealm","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"AmP0IcBYRYysT9ejx7CDNA"
,"elasticsearch.node.name":"elasticsearch-aud-es-data-wrk-0","elasticsearch.cluster.name":"elasticsearch-aud"}
{"@timestamp":"2023-05-06T19:22:56.298Z", "log.level":"DEBUG", "message":"Authentication of [elastic-internal-probe] using realm [file/file1] with token [UsernamePasswordT
oken] was [AuthenticationResult{status=SUCCESS, value=User[username=elastic-internal-probe,roles=[elastic_internal_probe_user],fullName=null,email=null,metadata={}], messa
ge=null, exception=null}]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es
-data-wrk-0][transport_worker][T#1]","log.logger":"org.elasticsearch.xpack.security.authc.RealmsAuthenticator","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elast
icsearch.node.id":"AmP0IcBYRYysT9ejx7CDNA","elasticsearch.node.name":"elasticsearch-aud-es-data-wrk-0","elasticsearch.cluster.name":"elasticsearch-aud"}
{"@timestamp":"2023-05-06T19:23:31.813Z", "log.level":"DEBUG", "message":"no retention leases are expired from current retention leases [RetentionLeases{primaryTerm=1, ver
sion=3, leases={peer_recovery/AmP0IcBYRYysT9ejx7CDNA=RetentionLease{id='peer_recovery/AmP0IcBYRYysT9ejx7CDNA', retainingSequenceNumber=0, timestamp=1683390211700, source='
peer recovery'}}}]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-data-w
rk-0][management][T#1]","log.logger":"org.elasticsearch.index.seqno.ReplicationTracker","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"AmP0
IcBYRYysT9ejx7CDNA","elasticsearch.node.name":"elasticsearch-aud-es-data-wrk-0","elasticsearch.cluster.name":"elasticsearch-aud","tags":[" [8783249_23-05-06_12-40-24][0]"]
}

Here also, we are not using any of the security features, is it safe to disable it?
Can we change xpack.security.enabled to false?
I see that a lot of features are enabled by default (loading on start) - like ml and so - will it help us to disable all those unused stuff?

Probably best not to do that. DEBUG logs only really make sense for developers while debugging, and only then for specific components. The ReplicationTracker one could reasonably be a TRACE log (I opened #95899).

1 Like

I strongly recommend to work out what the problem is before just randomly changing stuff.

1 Like

Ok, we are still waiting to catch the next restart.

Here is what we see during a "heavy" indexing times:

data-wrk logs:

{"@timestamp":"2023-05-07T11:40:44.980Z", "log.level": "WARN", "message":"writing cluster state took [16206ms] which is above the warn threshold of [10s]; [skipped writing] global metadata, wrote [0] new mapp
ings, removed [0] mappings and skipped [1483] unchanged mappings, wrote metadata for [0] new indices and [1] existing indices, removed metadata for [0] indices and skipped [1512] unchanged indices", "ecs.vers
ion": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-data-wrk-7][AsyncLucenePersistedState#updateTask][T#1]","log.logger":"org
.elasticsearch.gateway.PersistedClusterStateService","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"JbSKpmy3RDeAYVsHRKyv7w","elasticsearch.node.name":"elasticsearch-aud-es-data
-wrk-7","elasticsearch.cluster.name":"elasticsearch-aud"}                                                                                                                                                       
{"@timestamp":"2023-05-07T11:41:03.918Z", "log.level": "WARN", "message":"writing cluster state took [10204ms] which is above the warn threshold of [10s]; [skipped writing] global metadata, wrote [1] new mapp
ings, removed [1] mappings and skipped [1482] unchanged mappings, wrote metadata for [0] new indices and [3] existing indices, removed metadata for [0] indices and skipped [1510] unchanged indices", "ecs.vers
ion": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-data-wrk-7][AsyncLucenePersistedState#updateTask][T#1]","log.logger":"org
.elasticsearch.gateway.PersistedClusterStateService","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"JbSKpmy3RDeAYVsHRKyv7w","elasticsearch.node.name":"elasticsearch-aud-es-data
-wrk-7","elasticsearch.cluster.name":"elasticsearch-aud"}                                                                                                                                                       
{"@timestamp":"2023-05-07T11:41:18.940Z", "log.level": "WARN", "message":"writing cluster state took [15006ms] which is above the warn threshold of [10s]; [wrote] global metadata, wrote [0] new mappings, remo
ved [1] mappings and skipped [1482] unchanged mappings, wrote metadata for [0] new indices and [2] existing indices, removed metadata for [1] indices and skipped [1510] unchanged indices", "ecs.version": "1.2
.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-data-wrk-7][AsyncLucenePersistedState#updateTask][T#1]","log.logger":"org.elasticse
arch.gateway.PersistedClusterStateService","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"JbSKpmy3RDeAYVsHRKyv7w","elasticsearch.node.name":"elasticsearch-aud-es-data-wrk-7","e
lasticsearch.cluster.name":"elasticsearch-aud"}                                                                                                                                                                 
{"@timestamp":"2023-05-07T11:42:46.119Z", "log.level": "WARN", "message":"writing cluster state took [11890ms] which is above the warn threshold of [10s]; [skipped writing] global metadata, wrote [1] new mapp
ings, removed [1] mappings and skipped [1481] unchanged mappings, wrote metadata for [0] new indices and [1] existing indices, removed metadata for [0] indices and skipped [1511] unchanged indices", "ecs.vers
ion": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-data-wrk-7][AsyncLucenePersistedState#updateTask][T#1]","log.logger":"org
.elasticsearch.gateway.PersistedClusterStateService","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"JbSKpmy3RDeAYVsHRKyv7w","elasticsearch.node.name":"elasticsearch-aud-es-data
-wrk-7","elasticsearch.cluster.name":"elasticsearch-aud"}                                                                                                                                                       
{"@timestamp":"2023-05-07T11:42:59.227Z", "log.level": "WARN", "message":"writing cluster state took [13005ms] which is above the warn threshold of [10s]; [skipped writing] global metadata, wrote [0] new mapp
ings, removed [0] mappings and skipped [1482] unchanged mappings, wrote metadata for [0] new indices and [1] existing indices, removed metadata for [0] indices and skipped [1511] unchanged indices", "ecs.vers
ion": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-data-wrk-7][AsyncLucenePersistedState#updateTask][T#1]","log.logger":"org
.elasticsearch.gateway.PersistedClusterStateService","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"JbSKpmy3RDeAYVsHRKyv7w","elasticsearch.node.name":"elasticsearch-aud-es-data
-wrk-7","elasticsearch.cluster.name":"elasticsearch-aud"}

active master:

{"@timestamp":"2023-05-07T11:41:33.647Z", "log.level":"ERROR", "message":"collector [index-stats] timed out when collecting data: node [JbSKpmy3RDeAYVsHRKyv7w] did not respond within [10s]", "ecs.version": "1
.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-master-0][generic][T#18]","log.logger":"org.elasticsearch.xpack.monitoring.collec
tor.indices.IndexStatsCollector","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"BnxNI5oEQd2Z0x6VGhwVQQ","elasticsearch.node.name":"elasticsearch-aud-es-master-0","elasticsearch
.cluster.name":"elasticsearch-aud"}                                                                                                                                                                             
{"@timestamp":"2023-05-07T11:41:37.825Z", "log.level": "WARN", "message":"Received response for a request that has timed out, sent [14.2s/14206ms] ago, timed out [4.2s/4202ms] ago, action [indices:monitor/sta
ts[n]], node [{elasticsearch-aud-es-data-wrk-7}{JbSKpmy3RDeAYVsHRKyv7w}{AcExajE7ShaPEorO-oz89g}{elasticsearch-aud-es-data-wrk-7}{10.2.15.80}{10.2.15.80:9300}{d}{8.7.0}{k8s_node_name=ip-10-2-9-9.ec2.internal, 
zone=WORKER, xpack.installed=true}], id [6896232]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch-aud-es-master-0][tr
ansport_worker][T#1]","log.logger":"org.elasticsearch.transport.TransportService","elasticsearch.cluster.uuid":"2xZgtNgpTAmv_kfkn-jxmw","elasticsearch.node.id":"BnxNI5oEQd2Z0x6VGhwVQQ","elasticsearch.node.nam
e":"elasticsearch-aud-es-master-0","elasticsearch.cluster.name":"elasticsearch-aud"}

It's me again!

I try to run dmesg --level=err,warn -Tx (to print only err/warn and timestmaps on a random pod:

elasticsearch@elasticsearch-aud-es-data-wrk-7:~$ dmesg --level=err,warn -Tx
kern  :warn  : [Sat May 13 09:33:58 2023]   #2
kern  :warn  : [Sat May 13 09:33:58 2023]   #3
kern  :warn  : [Sat May 13 09:33:58 2023]   #4
kern  :warn  : [Sat May 13 09:33:58 2023]   #5
kern  :warn  : [Sat May 13 09:33:58 2023]   #6
kern  :warn  : [Sat May 13 09:33:58 2023]   #7
kern  :warn  : [Sat May 13 09:33:58 2023]   #8
kern  :warn  : [Sat May 13 09:33:58 2023]   #9
kern  :warn  : [Sat May 13 09:33:58 2023]  #10
kern  :warn  : [Sat May 13 09:33:58 2023]  #11
kern  :warn  : [Sat May 13 09:33:58 2023]  #12
kern  :warn  : [Sat May 13 09:33:58 2023]  #13
kern  :warn  : [Sat May 13 09:33:58 2023]  #14
kern  :warn  : [Sat May 13 09:33:58 2023]  #15
kern  :warn  : [Sat May 13 09:33:58 2023]  #16
kern  :warn  : [Sat May 13 09:33:58 2023] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
kern  :warn  : [Sat May 13 09:33:58 2023] MMIO Stale Data CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/processor_mmio_stale_data.html for more details.
kern  :warn  : [Sat May 13 09:33:58 2023]  #17
kern  :warn  : [Sat May 13 09:33:58 2023]  #18
kern  :warn  : [Sat May 13 09:33:58 2023]  #19
kern  :warn  : [Sat May 13 09:33:58 2023]  #20
kern  :warn  : [Sat May 13 09:33:58 2023]  #21
kern  :warn  : [Sat May 13 09:33:58 2023]  #22
kern  :warn  : [Sat May 13 09:33:58 2023]  #23
kern  :warn  : [Sat May 13 09:33:58 2023]  #24
kern  :warn  : [Sat May 13 09:33:58 2023]  #25
kern  :warn  : [Sat May 13 09:33:58 2023]  #26
kern  :warn  : [Sat May 13 09:33:58 2023]  #27
kern  :warn  : [Sat May 13 09:33:58 2023]  #28
kern  :warn  : [Sat May 13 09:33:58 2023]  #29
kern  :warn  : [Sat May 13 09:33:58 2023]  #30
kern  :warn  : [Sat May 13 09:33:58 2023]  #31
kern  :warn  : [Sat May 13 09:33:59 2023] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
kern  :warn  : [Sat May 13 09:34:00 2023] i8042: Warning: Keylock active
kern  :warn  : [Sat May 13 09:34:00 2023] GPT:Primary header thinks Alt. header is not at the end of the disk.
kern  :warn  : [Sat May 13 09:34:00 2023] GPT:8388607 != 209715199
kern  :warn  : [Sat May 13 09:34:00 2023] GPT:Alternate GPT header not at the end of the disk.
kern  :warn  : [Sat May 13 09:34:00 2023] GPT:8388607 != 209715199
kern  :warn  : [Sat May 13 09:34:00 2023] GPT: Use GNU Parted to correct GPT errors.
kern  :warn  : [Sat May 13 09:34:03 2023] printk: systemd: 15 output lines suppressed due to ratelimiting
kern  :warn  : [Sat May 13 09:34:05 2023] xfs filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
kern  :warn  : [Sat May 13 09:34:07 2023] xfs filesystem being remounted at /tmp supports timestamps until 2038 (0x7fffffff)
kern  :warn  : [Sat May 13 09:34:07 2023] xfs filesystem being remounted at /var/tmp supports timestamps until 2038 (0x7fffffff)
kern  :warn  : [Sat May 13 09:34:11 2023] xfs filesystem being remounted at /tmp supports timestamps until 2038 (0x7fffffff)
kern  :warn  : [Sat May 13 09:34:11 2023] xfs filesystem being remounted at /var/tmp supports timestamps until 2038 (0x7fffffff)

note that the events times are from yesterday (the date where the underline node started) and not from the pod startup time (5h, last restart).

Is something wrong with the attached EBS (via Kubernetes PV/PVC)?

elasticsearch@elasticsearch-aud-es-data-wrk-7:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay         100G  4.8G   96G   5% /
tmpfs            64M     0   64M   0% /dev
tmpfs           125G     0  125G   0% /sys/fs/cgroup
/dev/nvme0n1p1  100G  4.8G   96G   5% /etc/hosts
shm              64M     0   64M   0% /dev/shm
tmpfs            60G  4.0K   60G   1% /mnt/elastic-internal/elasticsearch-config
tmpfs            60G  4.0K   60G   1% /mnt/elastic-internal/downward-api
tmpfs            60G  4.0K   60G   1% /mnt/elastic-internal/probe-user
tmpfs            60G   28K   60G   1% /mnt/elastic-internal/xpack-file-realm
/dev/nvme1n1    492G   29G  464G   6% /usr/share/elasticsearch/data
tmpfs            60G   12K   60G   1% /usr/share/elasticsearch/config/http-certs
tmpfs            60G   68K   60G   1% /usr/share/elasticsearch/config/transport-certs
tmpfs            60G  4.0K   60G   1% /usr/share/elasticsearch/config/operator
tmpfs            60G  4.0K   60G   1% /usr/share/elasticsearch/config/transport-remote-certs
tmpfs           125G     0  125G   0% /proc/acpi
tmpfs           125G     0  125G   0% /sys/firmware

Is that looks fine?

I know this chain has gone on for a bit, and there have been some discussions, but would you be able to provide the Elasticsearch CRD file that you're using with the ECK operator to deploy this Elasticsearch cluster?

Hi @BenB196,
Thanks for your reply. Basically we copied from https://download.elastic.co/downloads/eck/2.7.0/crds.yaml

But we split it into chunks:

crd-agent.yaml
crd-apm.yaml
crd-autoscaling.yaml
crd-beat.yaml
crd-elasticsearch.yaml
crd-enterprisesearch.yaml
crd-kibana.yaml
crd-map.yaml
crd-stackconfigpolicies.yaml

I'm adding a link to a google-drive folder with the crd-elasticsearch.yaml and the elasticsearch-crds-describe.txt (kubectl describe crd):
https://drive.google.com/drive/folders/1GUtBNjMlJ7Fj5KS6qVI5t8VVdBe-Jrq9?usp=sharing

I will add all others if needed.

Apologies, I used the wrong term here, I meant the resource file (the file that describes/deploys the actual Elasticsearch cluster), not the custom resource definition.

@BenB196 I uploaded a file called data-wrk-describe.txt to the same shared drive folder

Thanks @Itay_Bittan, one thing that sticks out to me:

Containers:
   elasticsearch:
    Limits:
      cpu:     6900m
      memory:  59Gi
    Requests:
      cpu:      6900m
      memory:   59Gi
Volume Claims:
  Name:          elasticsearch-data
  StorageClass:  gp3-elasticsearch-aud
  Capacity:      500Gi

The above is quite heavy compute specs (especially the RAM) for ~500GB of data. While this isn't inherently bad, this just kind of stands out to me.

Would you be able to describe your use case a bit more in detail, and elaborate a bit more on:

We are heavily indexing data (bulks)

I see you're using GP3 storage, could you provide the EC2 instance type you're using?


Also, would be you be able to provide an example of your dedicated master node (or provide the entire config you're using to deploy the entire cluster)?

(There are some Kubernetes specific things you can do when deploying Elasticsearch to Kubernetes to help improve performance. I've covered them here in the past: Slower Perfomance with Elaticsearch cluster in kubernetes compared to Docker - #9 by BenB196)

Hi @BenB196,

Our use-case:
We are indexing around 7TB on a daily basis.
All the indices are being replaced once a day (during different hours) with a fresh data.
Each index represent one customer (business).
The variety of indices is big, from a few kilobytes index up to a 500 gigabytes index.
after being indexed, no more writes to an index. It will be used for search only until a new index will be ready (in the upcoming day) and replace it.
We are using alias per index (when a new index is ready, we flip alias and delete the old one).
While we are indexing a lot of data, our search rate is really rare - and used for analytics dashboards.
We divided our cluster into two node sets: data-wrk and data-srv.
When a new index created, it created in one (or more) of the data-wrk nodes/pods.
Number of shards depends on it "expected" size but it is something between 1 shard (for the small indices) up to 20 shards (for the exceptional 500GB index). We start with 0 replications.
After the indexing phase done, we relocate the index's shards to the data-srv nodes.
Then, we force-merge, add one replica, flipping the alias with the corresponding older index and delete the older index.
We are running with 8 data-wrk nodes to allow faster indexing, but most of the time each one of those nodes holds only few shards (~0-10). We set 500GB gp3 EBS Volumes (with provisioned IOPS) to make sure we can handle those huge index whenever we need.
On the other data-srv node-set, we use 4 instances, 2.5TB EBS Volume each to be able to store all the "serving" indices (~1,500 indices, ~6,000 shards). Those nodes are able to publish their state to master with no issue.
We are using spot instances for the data-wrk nodes - because we are willing to "pay the price" of loosing node while indexing and reindex, but in the master logs we see much more followers check retry count exceeded (and the logs that shows that the worker didn't respond after 30s) rather than the disconnected (which mean that the spot node was replaced by a new node). Obviously, we are trying to prevent the first one.
We are using Karpenter (if that make any change) for the spot optimization and Amazon EBS CSI driver for the gp3 volumes.

We are running on EKS 1.22 (AWS)
The problematic data(-wrk) nodes are running on top of m5d.8xlarge, r5.8xlarge, instance-type.

The master node details is added to the google-drive folder, named: master-stateful-set-describe.txt.

Again, our main concern is that we see data nodes restarts, as they didn't respond to master after 30s. While this one will stop, I'll try to tweak other stuff to optimize indexing speed.

Thanks!

Ohh. Spot nodes don't really work here. A spot node removal would lead to followers check retry count exceeded indeed.

Thanks @DavidTurner - but why? I don't see any evidence that the node was taken down due to spot termination.
When a spot is about to terminate, we see that disconnected was the reason (in the node-left log).
I will double check that again - but I almost sure that saw multiple times that when we get the followers check retry count exceeded - the spot was not terminated (under the hood).