shutdown all node, upgrade the packages and restarted all of them.
simple upgrade.
there is not much information in log.
I had all the index in data node and all three data node was having such issue. hence what I did is
assign one master to data as well. that let me bring up kibana (as kibana index started on it)
then execute following (I was not able to use curl because none of the index there)
PUT /_cluster/settings?flat_settings=true
{
"transient" : {
"indices.recovery.max_bytes_per_sec" : "10mb"
}
}
as I see that it was may be causing problem from some other thred. ( I am not sure as this container had enough memory and cpu)
then try to restart the data node but was keep dying hence remove few largest index ( I had backup of it and will restore later time)
but here is more log
[2022-03-30T01:39:25,516][INFO ][o.e.n.Node ] [elkd01] initialized
[2022-03-30T01:39:25,517][INFO ][o.e.n.Node ] [elkd01] starting ...
[2022-03-30T01:39:25,543][INFO ][o.e.x.s.c.f.PersistentCache] [elkd01] persistent cache index loaded
[2022-03-30T01:39:25,544][INFO ][o.e.x.d.l.DeprecationIndexingComponent] [elkd01] deprecation component started
[2022-03-30T01:39:25,630][INFO ][o.e.t.TransportService ] [elkd01] publish_address {10.59.10.77:9300}, bound_addresses {10.59.10.77:9300}
[2022-03-30T01:39:26,377][INFO ][o.e.b.BootstrapChecks ] [elkd01] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2022-03-30T01:39:26,380][INFO ][o.e.c.c.Coordinator ] [elkd01] cluster UUID [fVO-T1osSbeXzNmw4ig00w]
[2022-03-30T01:39:26,877][INFO ][o.e.c.s.ClusterApplierService] [elkd01] master node changed {previous [], current [{elkm02}{OL3BNCw6Sx2lrGNGitti8g}{IGHXEiU8SkOgablMWy5hjA}{10.59.10.35}{10.59.10.35:9300}{dhimt}]}, added {{elkm03}{02E2DU5BQh6KnKeZQPKFNg}{jqGHTpnKR_u0L3xJNuEU6g}{10.59.10.37}{10.59.10.37:9300}{dhimt}, {elk01}{z7OFjxfdQnS6rrLrUrXC9A}{UYxeE7GIQci1r_NwI4b7yw}{10.59.10.80}{10.59.10.80:9300}, {elkm01}{7PDgW5xYSPmT5jIohBYz-A}{qHbMq_dOR1-NOgNI_N9-hw}{10.59.10.34}{10.59.10.34:9300}{dhimt}, {elkm02}{OL3BNCw6Sx2lrGNGitti8g}{IGHXEiU8SkOgablMWy5hjA}{10.59.10.35}{10.59.10.35:9300}{dhimt}, {elk02}{eTqjGV6ZQh-RP-p6-zvMBw}{V4oZ1ZjhSCyuQ5dq3rFIfA}{10.59.10.81}{10.59.10.81:9300}}, term: 38, version: 22198, reason: ApplyCommitRequest{term=38, version=22198, sourceNode={elkm02}{OL3BNCw6Sx2lrGNGitti8g}{IGHXEiU8SkOgablMWy5hjA}{10.59.10.35}{10.59.10.35:9300}{dhimt}{xpack.installed=true, transform.node=true}}
[2022-03-30T01:39:27,044][INFO ][o.e.c.s.ClusterSettings ] [elkd01] updating [xpack.monitoring.collection.enabled] from [false] to [true]
[2022-03-30T01:39:27,044][INFO ][o.e.i.r.RecoverySettings ] [elkd01] using rate limit [10mb] with [default=10mb, read=0b, write=0b, max=0b]
[2022-03-30T01:39:27,194][INFO ][o.e.x.s.a.TokenService ] [elkd01] refresh keys
[2022-03-30T01:39:27,361][INFO ][o.e.x.s.a.TokenService ] [elkd01] refreshed keys
[2022-03-30T01:39:27,434][INFO ][o.e.l.LicenseService ] [elkd01] license [55905ffe-33d4-4a71-be22-74c517477ae1] mode [basic] - valid
[2022-03-30T01:39:27,435][INFO ][o.e.x.s.a.Realms ] [elkd01] license mode is [basic], currently licensed security realms are [reserved/reserved,file/default_file,native/default_native]
[2022-03-30T01:39:27,436][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [elkd01] Active license is now [BASIC]; Security is enabled
[2022-03-30T01:39:27,445][INFO ][o.e.h.AbstractHttpServerTransport] [elkd01] publish_address {10.59.10.77:9200}, bound_addresses {10.59.10.77:9200}
[2022-03-30T01:39:27,445][INFO ][o.e.n.Node ] [elkd01] started
[2022-03-30T01:39:58,075][INFO ][o.e.c.s.ClusterApplierService] [elkd01] added {{elkd03}{nuDxSZZmRmGORTAN5Z-6oA}{tnFDjsDDQP-N062ISceGtQ}{10.59.10.79}{10.59.10.79:9300}{dh}}, term: 38, version: 22216, reason: ApplyCommitRequest{term=38, version=22216, sourceNode={elkm02}{OL3BNCw6Sx2lrGNGitti8g}{IGHXEiU8SkOgablMWy5hjA}{10.59.10.35}{10.59.10.35:9300}{dhimt}{xpack.installed=true, transform.node=true}}
[2022-03-30T01:39:58,742][INFO ][o.e.c.s.ClusterApplierService] [elkd01] added {{elkd02}{PprttoFnS0yWaM_vu9EvyA}{L29PkS2uTBqcsgglHpWyDA}{10.59.10.78}{10.59.10.78:9300}{dh}}, term: 38, version: 22217, reason: ApplyCommitRequest{term=38, version=22217, sourceNode={elkm02}{OL3BNCw6Sx2lrGNGitti8g}{IGHXEiU8SkOgablMWy5hjA}{10.59.10.35}{10.59.10.35:9300}{dhimt}{xpack.installed=true, transform.node=true}}
[2022-03-30T02:02:07,060][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [elkd01] fatal error in thread [elasticsearch[elkd01][generic][T#3]], exiting
java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
at org.apache.lucene.store.DataInput.readBytes(DataInput.java:88) ~[lucene-core-8.11.1.jar:8.11.1 0b002b11819df70783e83ef36b42ed1223c14b50 - janhoy - 2021-12-14 13:46:43]
at org.elasticsearch.indices.recovery.RecoverySourceHandler$3.nextChunkRequest(RecoverySourceHandler.java:1383) ~[elasticsearch-7.17.1.jar:7.17.1]
at org.elasticsearch.indices.recovery.RecoverySourceHandler$3.nextChunkRequest(RecoverySourceHandler.java:1344) ~[elasticsearch-7.17.1.jar:7.17.1]
at org.elasticsearch.indices.recovery.MultiChunkTransfer.getNextRequest(MultiChunkTransfer.java:168) ~[elasticsearch-7.17.1.jar:7.17.1]
at org.elasticsearch.indices.recovery.MultiChunkTransfer.handleItems(MultiChunkTransfer.java:131) ~[elasticsearch-7.17.1.jar:7.17.1]
at org.elasticsearch.indices.recovery.MultiChunkTransfer.access$000(MultiChunkTransfer.java:48) ~[elasticsearch-7.17.1.jar:7.17.1]
at org.elasticsearch.indices.recovery.MultiChunkTransfer$1.write(MultiChunkTransfer.java:72) ~[elasticsearch-7.17.1.jar:7.17.1]
at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:97) ~[elasticsearch-7.17.1.jar:7.17.1]
at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:85) ~[elasticsearch-7.17.1.jar:7.17.1]
at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.put(AsyncIOProcessor.java:73) ~[elasticsearch-7.17.1.jar:7.17.1]
at org.elasticsearch.indices.recovery.MultiChunkTransfer.addItem(MultiChunkTransfer.java:83) ~[elasticsearch-7.17.1.jar:7.17.1]
at org.elasticsearch.indices.recovery.MultiChunkTransfer.lambda$handleItems$3(MultiChunkTransfer.java:125) ~[elasticsearch-7.17.1.jar:7.17.1]
and it is repeated may time over "java.land.InternalError.........."
Finally it stay up for long time and suddenly die with same error again. hence I have stop allocation
cluster.routing.allocation.enable: "none"