LockObtainFailedException in ES


(pran) #1

Hi,
We have a cluster with 16 nodes. Each node is Data and Master Node. We have ES version 2.4. Recently, we have started getting the following error:
[2017-05-01 23:28:04,221][WARN ][cluster.action.shard ] [servername] [indexname][5] received shard failed for target shard [[indexname][5], node[a
qM0l4oWS7iIXKiBJtmjkQ], [R], v[525], s[INITIALIZING], a[id=P_wPEdfkQ2OMWgZjPs6K8w], unassigned_info[[reason=NODE_LEFT], at[2017-05-01T21:24:56.380Z], details[node_left[
aqM0l4oWS7iIXKiBJtmjkQ]]]], indexUUID [zIkg0_jkTJeH3vUIqEm2Zg], message [failed to create shard], failure [ElasticsearchException[failed to create shard]; nested: LockO
btainFailedException[Can't lock shard [indexname][5], timed out after 5000ms]; ]
[indexname][[indexname][5]] ElasticsearchException[failed to create shard]; nested: LockObtainFailedException[Can't lock shard [indexname][5], timed out after 5000ms];
at org.elasticsearch.index.IndexService.createShard(IndexService.java:389)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:601)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:501)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:166)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.lucene.store.LockObtainFailedException: Can't lock shard [indexname][5], timed out after 5000ms
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:609)
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:537)
at org.elasticsearch.index.IndexService.createShard(IndexService.java:306)
... 10 more

Please help me to understand this error and let me know as what could be the probable reason for this error. Please let me know if you require any more information.
Regards,
Pran


(Christian Dahlqvist) #2

What type of file system are you using?


(pran) #3

Hi,
Following are server specs:
Virtual - Linux - 16 core 64 RAM - HyperV
16 core 64 GB RAM

Let me know if you require more information
Regards,
Pran


(pran) #4

Hi Christian,
We are getting this error in production and are not sure as why it is coming. There is no issue with the disk space as per the following stats:
"path": "/mnt/sdb/elasticsearch/abc_production/nodes/0",
"mount": "/mnt/sdb (/dev/sdb)",
"type": "ext3",
"total_in_bytes": 1056894091264,
"free_in_bytes": 905138794496,
"available_in_bytes": 851451703296,
"spins": "true"

Please let me know if you have any clue regarding the aforesaid issue. Do let me know if you require more information about about cluster setting.
Regards,
Pran


(Christian Dahlqvist) #5

If I calculate correctly, your disk is just above 85% full, which means it has passed the low watermark. This will impact how Elasticsearch allocates shards, and even though I am not sure it is directly related to your issue, it seems like a strange coincidence. You may want to change the watermark settings or remove some data to see if that affects the issue.


(pran) #6

Hi Christian,
I think It is not 85% full but it is 85% space available. so I there is no issue with space.
Regards,
Pran


(Christian Dahlqvist) #7

You are indeed correct. Not sure how I ended up getting that switched around....


(pran) #8

Hi Christian,
wanted to share the logs from our cluster so that probably you get better understanding as what is going wrong in our cluster. Please let me know your mailing address so that I can send it to you. (sorry as I am not able to find any option to upload logs here.)
Regards,
Pran


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.