Hi,
I have a cluster contains 3 data nodes 31 GB of each, and 3 master nodes 8 GB of each. I have around a total of 252 shards. The total data size is approx. 65 GB.
I am using 7.7.1 ElasticSearch
When I run a load of approx. 600 Users, concurrent search, and 200 users indexing, My shards are going into an unassigned state within 10-15 minutes. My all nodes are up.
My nodes first getting disconnected and rejoined the cluster automatically, but after rejoining, shards get unassigned.
Wrapped by: org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:551) ~[na:na]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:309) ~[na:na]
at org.elasticsearch.action.search.FetchSearchPhase.moveToNextPhase(FetchSearchPhase.java:231) ~[na:na]
at org.elasticsearch.action.search.FetchSearchPhase.lambda$innerRun$1(FetchSearchPhase.java:119) ~[na:na]
at org.elasticsearch.action.search.CountedCollector.countDown(CountedCollector.java:53) ~[na:na]
at org.elasticsearch.action.search.CountedCollector.onFailure(CountedCollector.java:76) ~[na:na]
at org.elasticsearch.action.search.FetchSearchPhase$2.onFailure(FetchSearchPhase.java:198) ~[na:na]
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) ~[na:na]
at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:402) ~[na:na]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1139) ~[na:na]
at org.elasticsearch.transport.TransportService$8.run(TransportService.java:1001) ~[na:na]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) ~[na:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:na]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:na]
at java.lang.Thread.run(Thread.java:748) ~[na:na]
Also, I am getting below exception.
[2020-08-05T14:14:54,724][WARN ][o.e.i.c.IndicesClusterStateService] [test-cluster] [test-index][1] marking and sending shard failed due to [failed to create shard]
java.io.IOException: failed to obtain in-memory shard lock
at org.elasticsearch.index.IndexService.createShard(IndexService.java:481) ~[elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:718) ~[elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:176) ~[elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:592) ~[elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:568) ~[elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:248) ~[elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:517) ~[elasticsearch-7.7.1.jar:7.7.1]
at java.lang.Iterable.forEach(Iterable.java:75) [?:1.8.0_202]
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:514) [elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.cluster.service.ClusterApplierService.access$100(ClusterApplierService.java:73) [elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176) [elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) [elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.7.1.jar:7.7.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_202]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_202]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
Caused by: org.elasticsearch.env.ShardLockObtainFailedException: [test-index][1]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:771) ~[elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:686) ~[elasticsearch-7.7.1.jar:7.7.1]
at org.elasticsearch.index.IndexService.createShard(IndexService.java:401) ~[elasticsearch-7.7.1.jar:7.7.1]
... 18 more
What changes need to be done at the cluster level to make the performance better?
Thanks