Shard lock issue

Dear Community,

Would you happen to have any hints what might be wrong (and how to resolve it) with my elasticsearch cluster. I've cluster with three master nodes and two data nodes.
Most of the indices are configured with couple shards and one replication.

Both data nodes have 8 VCPU and 16GB of memory.
OS is Debian GNU/Linux 11 (bullseye)
Elasticsearch version 7.5.0

Everything (cluster status is green) starts ok, stays like that for couple hours and suddenly one specific data node gets in trouble. Cluster health changes to yellow.

From this specific node logs, I see following logs:

[2023-01-27T02:03:05,265][INFO ][o.e.i.s.TransportNodesListShardStoreMetaData] [elastic4.mydomain.tld] [myindex-2023.01][1]: failed to obtain shard lock
org.elasticsearch.env.ShardLockObtainFailedException: [myindex-2023.01][1]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [read metadata snapshot]
	at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:769) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:684) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:449) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:171) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:118) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:66) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:129) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:244) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:240) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) [x-pack-security-7.5.0.jar:7.5.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) [x-pack-security-7.5.0.jar:7.5.0]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.5.0.jar:7.5.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:830) [?:?]
[2023-01-27T02:03:05,327][INFO ][o.e.i.s.TransportNodesListShardStoreMetaData] [elastic4.mydomain.tld] [myindex-2023.01][0]: failed to obtain shard lock
org.elasticsearch.env.ShardLockObtainFailedException: [myindex-2023.01][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [read metadata snapshot]
	at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:769) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:684) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:449) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:171) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:118) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:66) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:129) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:244) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:240) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) [x-pack-security-7.5.0.jar:7.5.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) [x-pack-security-7.5.0.jar:7.5.0]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.5.0.jar:7.5.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:830) [?:?]
[2023-01-27T02:03:10,314][WARN ][o.e.i.c.IndicesClusterStateService] [elastic4.mydomain.tld] [myindex-2023.01][1] marking and sending shard failed due to [failed to create shard]
java.io.IOException: failed to obtain in-memory shard lock
	at org.elasticsearch.index.IndexService.createShard(IndexService.java:446) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:658) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:165) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:610) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:586) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:266) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:517) [elasticsearch-7.5.0.jar:7.5.0]
	at java.lang.Iterable.forEach(Iterable.java:75) [?:?]
	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:514) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.cluster.service.ClusterApplierService.access$100(ClusterApplierService.java:73) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.5.0.jar:7.5.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: org.elasticsearch.env.ShardLockObtainFailedException: [myindex-2023.01][1]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]
	at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:769) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:684) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.index.IndexService.createShard(IndexService.java:366) ~[elasticsearch-7.5.0.jar:7.5.0]
	... 18 more
[2023-01-27T02:03:15,356][WARN ][o.e.i.c.IndicesClusterStateService] [elastic4.mydomain.tld] [myindex-2023.01][0] marking and sending shard failed due to [failed to create shard]
java.io.IOException: failed to obtain in-memory shard lock
	at org.elasticsearch.index.IndexService.createShard(IndexService.java:446) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:658) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:165) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:610) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:586) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:266) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:517) [elasticsearch-7.5.0.jar:7.5.0]
	at java.lang.Iterable.forEach(Iterable.java:75) [?:?]
	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:514) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.cluster.service.ClusterApplierService.access$100(ClusterApplierService.java:73) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.5.0.jar:7.5.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: org.elasticsearch.env.ShardLockObtainFailedException: [myindex-2023.01][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]
	at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:769) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:684) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.index.IndexService.createShard(IndexService.java:366) ~[elasticsearch-7.5.0.jar:7.5.0]
	... 18 more

When I read logs from the active master node, it shows "node not connected" errors and shard lock errors from the data node (elastic4.mydomain.tld):

[2023-01-27T02:03:00,193][WARN ][o.e.c.r.a.AllocationService] [elastic-master2.mydomain.tld] failing shard [failed shard, shard [myindex-2023.01][1], node[YfkLcj6FS4iZlvUsen_s2A], [R], s[STARTED], a[id=zJ6424jgSJqMjzXZQdYIZg], message [failed to perform indices:data/write/bulk[s] on replica [myindex-2023.01][1], node[YfkLcj6FS4iZlvUsen_s2A
], [R], s[STARTED], a[id=zJ6424jgSJqMjzXZQdYIZg]], failure [NodeNotConnectedException[[elastic4.mydomain.tld][10.13.37.93:9300] Node not connected]], markAsStale [true]]
org.elasticsearch.transport.NodeNotConnectedException: [elastic4.mydomain.tld][10.13.37.93:9300] Node not connected
        at org.elasticsearch.transport.ConnectionManager.getConnection(ConnectionManager.java:189) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:617) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:589) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicasProxy.performOn(TransportReplicationAction.java:1035) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplica(ReplicationOperation.java:173) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplicas(ReplicationOperation.java:160) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.handlePrimaryResult(ReplicationOperation.java:135) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:285) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction$2.finishRequest(TransportShardBulkAction.java:188) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:170) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:193) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:118) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:79) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:917) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:108) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:394) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:316) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$21(IndexShard.java:2752) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:113) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:285) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:237) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2726) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:858) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:312) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction.handlePrimaryRequest(TransportReplicationAction.java:275) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) ~[?:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) ~[?:?]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:752) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
[2023-01-27T02:03:00,199][INFO ][o.e.c.r.a.AllocationService] [elastic-master2.mydomain.tld] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[myindex-2023.01][1]]]).
[2023-01-27T02:03:00,256][WARN ][o.e.c.r.a.AllocationService] [elastic-master2.mydomain.tld] failing shard [failed shard, shard [myindex-2023.01][0], node[YfkLcj6FS4iZlvUsen_s2A], [R], s[STARTED], a[id=s7M2kw_mSP6oDNr-7NgCMA], message [failed to perform indices:data/write/bulk[s] on replica [myindex-2023.01][0], node[YfkLcj6FS4iZlvUsen_s2A
], [R], s[STARTED], a[id=s7M2kw_mSP6oDNr-7NgCMA]], failure [NodeNotConnectedException[[elastic4.mydomain.tld][10.13.37.93:9300] Node not connected]], markAsStale [true]]
org.elasticsearch.transport.NodeNotConnectedException: [elastic4.mydomain.tld][10.13.37.93:9300] Node not connected
        at org.elasticsearch.transport.ConnectionManager.getConnection(ConnectionManager.java:189) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:617) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:589) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicasProxy.performOn(TransportReplicationAction.java:1035) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplica(ReplicationOperation.java:173) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplicas(ReplicationOperation.java:160) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.handlePrimaryResult(ReplicationOperation.java:135) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:285) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction$2.finishRequest(TransportShardBulkAction.java:188) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:170) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:193) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:118) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:79) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:917) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:108) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:394) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:316) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$21(IndexShard.java:2752) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:113) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:285) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:237) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2726) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:858) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:312) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction.handlePrimaryRequest(TransportReplicationAction.java:275) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) ~[?:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) ~[?:?]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:752) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-7.5.0.jar:7.5.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]

When I query for cluster allocation http://elastic-master2.mydomain.tld:9200/_cluster/allocation/explain?pretty information, I get following:

{
	"index": "myindex-2023.01",
	"shard": 0,
	"primary": false,
	"current_state": "unassigned",
	"unassigned_info": {
		"reason": "ALLOCATION_FAILED",
		"at": "2023-01-27T02:03:48.572Z",
		"failed_allocation_attempts": 5,
		"details": "failed shard on node [YfkLcj6FS4iZlvUsen_s2A]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[myindex-2023.01][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ",
		"last_allocation_status": "no_attempt"
	},
	"can_allocate": "no",
	"allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
	"node_allocation_decisions": [{
			"node_id": "-5YVQVTOQNOzVm2dC6IMhw",
			"node_name": "elastic3.mydomain.tld",
			"transport_address": "10.13.37.23:9300",
			"node_attributes": {
				"ml.machine_memory": "16760512512",
				"ml.max_open_jobs": "20",
				"xpack.installed": "true"
			},
			"node_decision": "no",
			"deciders": [{
					"decider": "max_retry",
					"decision": "NO",
					"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2023-01-27T02:03:48.572Z], failed_attempts[5], failed_nodes[[YfkLcj6FS4iZlvUsen_s2A]], delayed=false, details[failed shard on node [YfkLcj6FS4iZlvUsen_s2A]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[myindex-2023.01][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
				},
				{
					"decider": "same_shard",
					"decision": "NO",
					"explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[myindex-2023.01][0], node[-5YVQVTOQNOzVm2dC6IMhw], [P], s[STARTED], a[id=78cNqX_VTfai09zPS6B0Tg]]"
				}
			]
		},
		{
			"node_id": "YfkLcj6FS4iZlvUsen_s2A",
			"node_name": "elastic4.mydomain.tld",
			"transport_address": "10.13.37.93:9300",
			"node_attributes": {
				"ml.machine_memory": "16761794560",
				"ml.max_open_jobs": "20",
				"xpack.installed": "true"
			},
			"node_decision": "no",
			"deciders": [{
				"decider": "max_retry",
				"decision": "NO",
				"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2023-01-27T02:03:48.572Z], failed_attempts[5], failed_nodes[[YfkLcj6FS4iZlvUsen_s2A]], delayed=false, details[failed shard on node [YfkLcj6FS4iZlvUsen_s2A]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[myindex-2023.01][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
			}]
		}
	]
}

Any ideas? Please let me know if you require more information.
Many thanks in advance!

Br,
Aki

Elasticsearch version 7.5 is EOL and no longer supported. Please upgrade ASAP.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

What type of storage are you using?

What is the full output of the cluster stats API?

Note that the version you are using is very old and EOL. I would recommend upgrading to at least the latest in the 7.17 line.

Hey Christian,
First of all thanks for your response, appreciated!

All nodes are using local storage (SSD) directly from virtualization host.
Cluster stats looks like this:

{
	"_nodes": {
		"total": 5,
		"successful": 5,
		"failed": 0
	},
	"cluster_name": "mycluster-2018",
	"cluster_uuid": "jk0W8Wz0SBmzMvb2-O1CVQ",
	"timestamp": 1674811993634,
	"status": "yellow",
	"indices": {
		"count": 122,
		"shards": {
			"total": 381,
			"primaries": 193,
			"replication": 0.9740932642487047,
			"index": {
				"shards": {
					"min": 2,
					"max": 10,
					"avg": 3.122950819672131
				},
				"primaries": {
					"min": 1,
					"max": 5,
					"avg": 1.5819672131147542
				},
				"replication": {
					"min": 0,
					"max": 1,
					"avg": 0.9877049180327869
				}
			}
		},
		"docs": {
			"count": 808055906,
			"deleted": 12052885
		},
		"store": {
			"size_in_bytes": 883266203149
		},
		"fielddata": {
			"memory_size_in_bytes": 0,
			"evictions": 0
		},
		"query_cache": {
			"memory_size_in_bytes": 9239967,
			"total_count": 240064,
			"hit_count": 3389,
			"miss_count": 236675,
			"cache_size": 447,
			"cache_count": 447,
			"evictions": 0
		},
		"completion": {
			"size_in_bytes": 0
		},
		"segments": {
			"count": 5112,
			"memory_in_bytes": 789364946,
			"terms_memory_in_bytes": 420509570,
			"stored_fields_memory_in_bytes": 299113552,
			"term_vectors_memory_in_bytes": 0,
			"norms_memory_in_bytes": 3877696,
			"points_memory_in_bytes": 61217870,
			"doc_values_memory_in_bytes": 4646258,
			"index_writer_memory_in_bytes": 281553042,
			"version_map_memory_in_bytes": 31277150,
			"fixed_bit_set_memory_in_bytes": 6928,
			"max_unsafe_auto_id_timestamp": 1674808965479,
			"file_sizes": {}
		}
	},
	"nodes": {
		"count": {
			"total": 5,
			"coordinating_only": 0,
			"data": 2,
			"ingest": 2,
			"master": 3,
			"ml": 2,
			"voting_only": 0
		},
		"versions": [
			"7.5.0"
		],
		"os": {
			"available_processors": 36,
			"allocated_processors": 36,
			"names": [{
				"name": "Linux",
				"count": 5
			}],
			"pretty_names": [{
					"pretty_name": "Debian GNU/Linux 11 (bullseye)",
					"count": 2
				},
				{
					"pretty_name": "Debian GNU/Linux 9 (stretch)",
					"count": 3
				}
			],
			"mem": {
				"total_in_bytes": 67060211712,
				"free_in_bytes": 15824715776,
				"used_in_bytes": 51235495936,
				"free_percent": 24,
				"used_percent": 76
			}
		},
		"process": {
			"cpu": {
				"percent": 3
			},
			"open_file_descriptors": {
				"min": 338,
				"max": 7801,
				"avg": 2977
			}
		},
		"jvm": {
			"max_uptime_in_millis": 113766446,
			"versions": [{
				"version": "13.0.1",
				"vm_name": "OpenJDK 64-Bit Server VM",
				"vm_version": "13.0.1+9",
				"vm_vendor": "AdoptOpenJDK",
				"bundled_jdk": true,
				"using_bundled_jdk": true,
				"count": 5
			}],
			"mem": {
				"heap_used_in_bytes": 8374540600,
				"heap_max_in_bytes": 29768351744
			},
			"threads": 450
		},
		"fs": {
			"total_in_bytes": 6435828801536,
			"free_in_bytes": 5541321633792,
			"available_in_bytes": 5214115545088
		},
		"plugins": [],
		"network_types": {
			"transport_types": {
				"security4": 5
			},
			"http_types": {
				"security4": 5
			}
		},
		"discovery_types": {
			"zen": 5
		},
		"packaging_types": [{
			"flavor": "default",
			"type": "deb",
			"count": 5
		}]
	}
}

I've noticed the same regarding my cluster version, but before running to upgrade, I was hoping to get somekind understanding about this issue.

Br,
Aki

Your open file descriptors is very low. should be set to 65,536 or higher.

That's not correct. You're looking at the maximum number of actually open FDs which is not something the user can control. If the limit is wrong then Elasticsearch won't even start.

This is the wrong way round IMO. We've definitely made newer versions more robust, but also we've made them easier to troubleshoot. Upgrade first, and then if you're still having problems we will find it much easier to help.

Thanks David,

I will follow yours and Christians suggestion for version upgrade.
Should I follow Rolling upgrades | Elasticsearch Guide [7.17] | Elastic guide completely or can I only upgrade this specific node that gives me trouble?

Br,
Aki

All nodes in the cluster need to run exactly the same version.

1 Like

Ah yes. My bad. I read that wrong. The correct way to find out is:

curl -X GET "localhost:9200/_nodes/stats/process?filter_path=**.max_file_descriptors&pretty"

FYI,
I've upgraded the whole cluster to version 7.17.8.

Before upgrading, I checked the open file descriptors as suggested by Sunile and all nodes returned 65536 (value is the same after upgrade):

{
	"nodes": {
		"uHVsT15iTzyFO5QvY2JafA": {
			"process": {
				"max_file_descriptors": 65535
			}
		},
		"sJFbC5ZySM-ystMzKwTj_w": {
			"process": {
				"max_file_descriptors": 65535
			}
		},
		"-5YVQVTOQNOzVm2dC6IMhw": {
			"process": {
				"max_file_descriptors": 65535
			}
		},
		"ixLLmEr4QqCazPcY4CH3pQ": {
			"process": {
				"max_file_descriptors": 65535
			}
		},
		"YfkLcj6FS4iZlvUsen_s2A": {
			"process": {
				"max_file_descriptors": 65535
			}
		}
	}
}

Br,
Aki

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.