Dear Community,
Would you happen to have any hints what might be wrong (and how to resolve it) with my elasticsearch cluster. I've cluster with three master nodes and two data nodes.
Most of the indices are configured with couple shards and one replication.
Both data nodes have 8 VCPU and 16GB of memory.
OS is Debian GNU/Linux 11 (bullseye)
Elasticsearch version 7.5.0
Everything (cluster status is green) starts ok, stays like that for couple hours and suddenly one specific data node gets in trouble. Cluster health changes to yellow.
From this specific node logs, I see following logs:
[2023-01-27T02:03:05,265][INFO ][o.e.i.s.TransportNodesListShardStoreMetaData] [elastic4.mydomain.tld] [myindex-2023.01][1]: failed to obtain shard lock
org.elasticsearch.env.ShardLockObtainFailedException: [myindex-2023.01][1]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [read metadata snapshot]
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:769) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:684) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:449) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:171) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:118) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:66) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:129) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:244) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:240) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) [x-pack-security-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) [x-pack-security-7.5.0.jar:7.5.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.5.0.jar:7.5.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
[2023-01-27T02:03:05,327][INFO ][o.e.i.s.TransportNodesListShardStoreMetaData] [elastic4.mydomain.tld] [myindex-2023.01][0]: failed to obtain shard lock
org.elasticsearch.env.ShardLockObtainFailedException: [myindex-2023.01][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [read metadata snapshot]
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:769) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:684) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:449) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:171) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:118) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:66) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:129) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:244) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:240) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) [x-pack-security-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) [x-pack-security-7.5.0.jar:7.5.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.5.0.jar:7.5.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
[2023-01-27T02:03:10,314][WARN ][o.e.i.c.IndicesClusterStateService] [elastic4.mydomain.tld] [myindex-2023.01][1] marking and sending shard failed due to [failed to create shard]
java.io.IOException: failed to obtain in-memory shard lock
at org.elasticsearch.index.IndexService.createShard(IndexService.java:446) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:658) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:165) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:610) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:586) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:266) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:517) [elasticsearch-7.5.0.jar:7.5.0]
at java.lang.Iterable.forEach(Iterable.java:75) [?:?]
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:514) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.ClusterApplierService.access$100(ClusterApplierService.java:73) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.5.0.jar:7.5.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: org.elasticsearch.env.ShardLockObtainFailedException: [myindex-2023.01][1]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:769) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:684) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.index.IndexService.createShard(IndexService.java:366) ~[elasticsearch-7.5.0.jar:7.5.0]
... 18 more
[2023-01-27T02:03:15,356][WARN ][o.e.i.c.IndicesClusterStateService] [elastic4.mydomain.tld] [myindex-2023.01][0] marking and sending shard failed due to [failed to create shard]
java.io.IOException: failed to obtain in-memory shard lock
at org.elasticsearch.index.IndexService.createShard(IndexService.java:446) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:658) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:165) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:610) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:586) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:266) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:517) [elasticsearch-7.5.0.jar:7.5.0]
at java.lang.Iterable.forEach(Iterable.java:75) [?:?]
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:514) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.ClusterApplierService.access$100(ClusterApplierService.java:73) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.5.0.jar:7.5.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: org.elasticsearch.env.ShardLockObtainFailedException: [myindex-2023.01][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:769) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:684) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.index.IndexService.createShard(IndexService.java:366) ~[elasticsearch-7.5.0.jar:7.5.0]
... 18 more
When I read logs from the active master node, it shows "node not connected" errors and shard lock errors from the data node (elastic4.mydomain.tld):
[2023-01-27T02:03:00,193][WARN ][o.e.c.r.a.AllocationService] [elastic-master2.mydomain.tld] failing shard [failed shard, shard [myindex-2023.01][1], node[YfkLcj6FS4iZlvUsen_s2A], [R], s[STARTED], a[id=zJ6424jgSJqMjzXZQdYIZg], message [failed to perform indices:data/write/bulk[s] on replica [myindex-2023.01][1], node[YfkLcj6FS4iZlvUsen_s2A
], [R], s[STARTED], a[id=zJ6424jgSJqMjzXZQdYIZg]], failure [NodeNotConnectedException[[elastic4.mydomain.tld][10.13.37.93:9300] Node not connected]], markAsStale [true]]
org.elasticsearch.transport.NodeNotConnectedException: [elastic4.mydomain.tld][10.13.37.93:9300] Node not connected
at org.elasticsearch.transport.ConnectionManager.getConnection(ConnectionManager.java:189) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:617) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:589) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicasProxy.performOn(TransportReplicationAction.java:1035) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplica(ReplicationOperation.java:173) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplicas(ReplicationOperation.java:160) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.ReplicationOperation.handlePrimaryResult(ReplicationOperation.java:135) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:285) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction$2.finishRequest(TransportShardBulkAction.java:188) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:170) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:193) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:118) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:79) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:917) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:108) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:394) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:316) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$21(IndexShard.java:2752) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:113) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:285) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:237) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2726) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:858) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:312) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.handlePrimaryRequest(TransportReplicationAction.java:275) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:752) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
[2023-01-27T02:03:00,199][INFO ][o.e.c.r.a.AllocationService] [elastic-master2.mydomain.tld] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[myindex-2023.01][1]]]).
[2023-01-27T02:03:00,256][WARN ][o.e.c.r.a.AllocationService] [elastic-master2.mydomain.tld] failing shard [failed shard, shard [myindex-2023.01][0], node[YfkLcj6FS4iZlvUsen_s2A], [R], s[STARTED], a[id=s7M2kw_mSP6oDNr-7NgCMA], message [failed to perform indices:data/write/bulk[s] on replica [myindex-2023.01][0], node[YfkLcj6FS4iZlvUsen_s2A
], [R], s[STARTED], a[id=s7M2kw_mSP6oDNr-7NgCMA]], failure [NodeNotConnectedException[[elastic4.mydomain.tld][10.13.37.93:9300] Node not connected]], markAsStale [true]]
org.elasticsearch.transport.NodeNotConnectedException: [elastic4.mydomain.tld][10.13.37.93:9300] Node not connected
at org.elasticsearch.transport.ConnectionManager.getConnection(ConnectionManager.java:189) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:617) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:589) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicasProxy.performOn(TransportReplicationAction.java:1035) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplica(ReplicationOperation.java:173) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplicas(ReplicationOperation.java:160) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.ReplicationOperation.handlePrimaryResult(ReplicationOperation.java:135) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:285) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction$2.finishRequest(TransportShardBulkAction.java:188) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:170) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:193) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:118) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:79) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:917) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:108) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:394) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:316) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$21(IndexShard.java:2752) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:113) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:285) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:237) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2726) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:858) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:312) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.action.support.replication.TransportReplicationAction.handlePrimaryRequest(TransportReplicationAction.java:275) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:752) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.0.jar:7.5.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
When I query for cluster allocation http://elastic-master2.mydomain.tld:9200/_cluster/allocation/explain?pretty
information, I get following:
{
"index": "myindex-2023.01",
"shard": 0,
"primary": false,
"current_state": "unassigned",
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2023-01-27T02:03:48.572Z",
"failed_allocation_attempts": 5,
"details": "failed shard on node [YfkLcj6FS4iZlvUsen_s2A]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[myindex-2023.01][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ",
"last_allocation_status": "no_attempt"
},
"can_allocate": "no",
"allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions": [{
"node_id": "-5YVQVTOQNOzVm2dC6IMhw",
"node_name": "elastic3.mydomain.tld",
"transport_address": "10.13.37.23:9300",
"node_attributes": {
"ml.machine_memory": "16760512512",
"ml.max_open_jobs": "20",
"xpack.installed": "true"
},
"node_decision": "no",
"deciders": [{
"decider": "max_retry",
"decision": "NO",
"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2023-01-27T02:03:48.572Z], failed_attempts[5], failed_nodes[[YfkLcj6FS4iZlvUsen_s2A]], delayed=false, details[failed shard on node [YfkLcj6FS4iZlvUsen_s2A]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[myindex-2023.01][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
},
{
"decider": "same_shard",
"decision": "NO",
"explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[myindex-2023.01][0], node[-5YVQVTOQNOzVm2dC6IMhw], [P], s[STARTED], a[id=78cNqX_VTfai09zPS6B0Tg]]"
}
]
},
{
"node_id": "YfkLcj6FS4iZlvUsen_s2A",
"node_name": "elastic4.mydomain.tld",
"transport_address": "10.13.37.93:9300",
"node_attributes": {
"ml.machine_memory": "16761794560",
"ml.max_open_jobs": "20",
"xpack.installed": "true"
},
"node_decision": "no",
"deciders": [{
"decider": "max_retry",
"decision": "NO",
"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2023-01-27T02:03:48.572Z], failed_attempts[5], failed_nodes[[YfkLcj6FS4iZlvUsen_s2A]], delayed=false, details[failed shard on node [YfkLcj6FS4iZlvUsen_s2A]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[myindex-2023.01][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
}]
}
]
}
Any ideas? Please let me know if you require more information.
Many thanks in advance!
Br,
Aki