Elasticsearch: Failed to obtain node locks

Kibana version : 7.8.0

Elasticsearch version : 7.8.0

APM Server version : 7.8.0

Hi Team

My elasticsearch server is up and running from more than 2-3 months. All of a sudden it's down today. When I tried to debug I see the below logs.

> {"type": "server", "timestamp": "2021-01-04T15:38:19,786Z", "level": "INFO", "component": "o.e.p.PluginsService", "cluster.name": "elasticsearch", "node.name": "docker-cluster", "message": "no plugins loaded" }
> {"type": "server", "timestamp": "2021-01-04T15:38:20,296Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "elasticsearch", "node.name": "docker-cluster", "message": "uncaught exception in thread [main]",
> "stacktrace": ["org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/elasticsearch/data]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?",
> "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:174) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127) ~[elasticsearch-cli-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "Caused by: java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/elasticsearch/data]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?",
> "at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:301) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.node.Node.<init>(Node.java:335) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.node.Node.<init>(Node.java:266) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:227) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:227) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:393) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "... 6 more"] }

I'm not sure why this happened and what to do ? Please suggest

Thanks
Rahul

The message

failed to obtain node locks, tried [[/usr/share/elasticsearch/data]] with lock id [0]; maybe these locations are not writable or multiple nodes were started

It means that you have a node already running or that there have been a weird crash that did not remove the lock in the data dir. Or that you are not starting the node with the proper way.

BTW you should upgrade to the latest version to make sure you have all the recent fixes.

@dadoonet

Sure Thank you.

Is it safe (in terms of data ) to upgrade the version ?

I don't think it's that - these locks are managed by the OS and are automatically cleaned up at process exit regardless of how the exit happened.

1 Like

Could this happen if the machine is shutdown like with a power outage?

Yes. But always do a snapshot before the upgrade as stated in the documentation.

Not if you're using local disks, no. Not sure how NFS would handle that but I think even there it works ok.

Hi @DavidTurner , @dadoonet

I've upgraded to the latest version and still It gives similar issue. Please have a look at the logs below

    {"type": "server", "timestamp": "2021-01-16T05:45:53,153Z", "level": "WARN", "component": "o.e.c.r.a.AllocationService", "cluster.name": "docker-cluster", "node.name": "elasticsearch-6f9bdbcb95-fgtgv", "message": "failing shard [failed shard, shard [logstash-alfresco-2020.07.17][0], node[n54V6RfvRu-hKRrNAaV9ew], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[INITIALIZING], a[id=v7IHeNemRU6mpPaDgytKgA], unassigned_info[[reason=ALLOCATION_FAILED], at[2021-01-16T05:45:52.220Z], failed_attempts[4], failed_nodes[[n54V6RfvRu-hKRrNAaV9ew]], delayed=false, details[failed shard on node [n54V6RfvRu-hKRrNAaV9ew]: failed recovery, failure RecoveryFailedException[[logstash-alfresco-2020.07.17][0]: Recovery failed on {elasticsearch-6f9bdbcb95-fgtgv}{n54V6RfvRu-hKRrNAaV9ew}{GjGMFfT0SkKzZEcmc53dbA}{10.42.21.96}{10.42.21.96:9300}{cdhilmrstw}{ml.machine_memory=4294967296, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: LockObtainFailedException[Lock held by another program: /usr/share/elasticsearch/data/nodes/0/indices/cgUkK37XSZGrzz1LyxF0vg/0/index/write.lock]; ], allocation_status[no_valid_shard_copy]], message [failed recovery], failure [RecoveryFailedException[[logstash-alfresco-2020.07.17][0]: Recovery failed on {elasticsearch-6f9bdbcb95-fgtgv}{n54V6RfvRu-hKRrNAaV9ew}{GjGMFfT0SkKzZEcmc53dbA}{10.42.21.96}{10.42.21.96:9300}{cdhilmrstw}{ml.machine_memory=4294967296, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: LockObtainFailedException[Lock held by another program: /usr/share/elasticsearch/data/nodes/0/indices/cgUkK37XSZGrzz1LyxF0vg/0/index/write.lock]; ], markAsStale [true]]", "cluster.uuid": "r-0GcXtHSy6tqhXAjLZS-g", "node.id": "n54V6RfvRu-hKRrNAaV9ew" ,
"stacktrace": ["org.elasticsearch.indices.recovery.RecoveryFailedException: [logstash-alfresco-2020.07.17][0]: Recovery failed on {elasticsearch-6f9bdbcb95-fgtgv}{n54V6RfvRu-hKRrNAaV9ew}{GjGMFfT0SkKzZEcmc53dbA}{10.42.21.96}{10.42.21.96:9300}{cdhilmrstw}{ml.machine_memory=4294967296, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}",
"at org.elasticsearch.index.shard.IndexShard.lambda$executeRecovery$21(IndexShard.java:2676) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.lambda$recoveryListener$6(StoreRecovery.java:355) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:328) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:96) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1894) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
"at java.lang.Thread.run(Thread.java:832) [?:?]",
"Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to recover from gateway",
"at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:441) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:98) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:325) ~[elasticsearch-7.10.1.jar:7.10.1]",
"... 8 more",
"Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: failed to create engine",
"at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:254) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:205) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1654) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1620) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:436) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:98) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:325) ~[elasticsearch-7.10.1.jar:7.10.1]",
"... 8 more",
"Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by another program: /usr/share/elasticsearch/data/nodes/0/indices/cgUkK37XSZGrzz1LyxF0vg/0/index/write.lock",
"at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:130) ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:35:28]",
"at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:35:28]",
"at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:35:28]",
"at org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105) ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:35:28]",
"at org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105) ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:35:28]",
"at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:923) ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:35:28]",
"at org.elasticsearch.index.engine.InternalEngine.createWriter(InternalEngine.java:2288) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.engine.InternalEngine.createWriter(InternalEngine.java:2276) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:247) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:205) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1654) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1620) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:436) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:98) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:325) ~[elasticsearch-7.10.1.jar:7.10.1]",
"... 8 more"] }

Response of _cat/shards api:

Response of _cat/indices api:

Please suggest

Thanks
Rahul

As it says, the lock is held by another program. Usually this means you have two Elasticsearch processes running on the same data path.

@DavidTurner

Is there a way to know the running elasticsearch processes that held the lock ?

We have 10-15 containers mounted to the same volume. And elasticsearch is running only in one of them.

Any suggestions ?

Best
Rahul

I'd use lsof or lslocks but I've no idea how well they work with containers.

Thank you @DavidTurner

I've manually deleted all the write.lock files (present when a process is reading/writing to the folder) inside each index. Now everything looks good.

Thank you for your time

Best
Rahul

Don't ever delete anything from inside the data path.