My online cluster frequently suffered from A lot many so sucked LockObtainFailedException

wuyunfeng · January 26, 2018, 2:46am

One of my online cluster start Elasticsearch(2.3.3) using the Linux supervise. May be some network reason, or may suffer from FullGC , a lot of LockObtainFailedException happened on some node, a lot of shard created failure, all logs looks like below:

ElasticsearchException[failed to create shard]; nested: LockObtainFailedException[Can't lock shard [wallet-bi-usertags-pass][40], timed out after 5000ms];
at org.elasticsearch.index.IndexService.createShard(IndexService.java:389)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:602)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:502)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:167)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:616)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:778)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
  Caused by: org.apache.lucene.store.LockObtainFailedException: Can't lock shard [wallet-bi-usertags-pass][40], timed out after 5000ms
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:623)
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:551)
at org.elasticsearch.index.IndexService.createShard(IndexService.java:306)
... 10 more

I suspect that when elasticsearch shutdown, and then dragged by supervise proccess, the Lucene write.lock does not realease by the elasticsearch JVM process.
Is there someone encountered the same so sucked situation like me? My current solution is killing the elasticsearch process, and then all things is ok...., but this will happen another day ....... so sucked...
Please help me....

system · February 23, 2018, 2:47am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
LockObtainFailedException in ES Elasticsearch	8	3086	June 2, 2017
ElasticSearch LockObtainFailedException on restoring index from s3 repository Elasticsearch	1	1046	July 5, 2017
Failed to create shard exception Elasticsearch	19	8189	July 5, 2017
Shard lock issue Elasticsearch	11	1790	February 26, 2023
Failed to obtain in-memory shard lock Elasticsearch	5	4756	April 14, 2021

My online cluster frequently suffered from A lot many so sucked LockObtainFailedException

Related topics