Elasticsearch: Failed to obtain node locks

rahulnama · January 4, 2021, 3:42pm

Kibana version : 7.8.0

Elasticsearch version : 7.8.0

APM Server version : 7.8.0

Hi Team

My elasticsearch server is up and running from more than 2-3 months. All of a sudden it's down today. When I tried to debug I see the below logs.

> {"type": "server", "timestamp": "2021-01-04T15:38:19,786Z", "level": "INFO", "component": "o.e.p.PluginsService", "cluster.name": "elasticsearch", "node.name": "docker-cluster", "message": "no plugins loaded" }
> {"type": "server", "timestamp": "2021-01-04T15:38:20,296Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "elasticsearch", "node.name": "docker-cluster", "message": "uncaught exception in thread [main]",
> "stacktrace": ["org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/elasticsearch/data]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?",
> "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:174) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127) ~[elasticsearch-cli-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "Caused by: java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/elasticsearch/data]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?",
> "at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:301) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.node.Node.<init>(Node.java:335) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.node.Node.<init>(Node.java:266) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:227) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:227) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:393) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.8.0.jar:7.8.0]",
> "... 6 more"] }

I'm not sure why this happened and what to do ? Please suggest

Thanks
Rahul

dadoonet · January 4, 2021, 4:35pm

The message

failed to obtain node locks, tried [[/usr/share/elasticsearch/data]] with lock id [0]; maybe these locations are not writable or multiple nodes were started

It means that you have a node already running or that there have been a weird crash that did not remove the lock in the data dir. Or that you are not starting the node with the proper way.

BTW you should upgrade to the latest version to make sure you have all the recent fixes.

rahulnama · January 4, 2021, 4:45pm

@dadoonet

Sure Thank you.

Is it safe (in terms of data ) to upgrade the version ?

DavidTurner · January 4, 2021, 4:54pm

I don't think it's that - these locks are managed by the OS and are automatically cleaned up at process exit regardless of how the exit happened.

dadoonet · January 4, 2021, 5:15pm

Could this happen if the machine is shutdown like with a power outage?

dadoonet · January 4, 2021, 5:16pm

Yes. But always do a snapshot before the upgrade as stated in the documentation.

DavidTurner · January 4, 2021, 5:48pm

Not if you're using local disks, no. Not sure how NFS would handle that but I think even there it works ok.

rahulnama · January 17, 2021, 6:25am

Hi @DavidTurner , @dadoonet

I've upgraded to the latest version and still It gives similar issue. Please have a look at the logs below

    {"type": "server", "timestamp": "2021-01-16T05:45:53,153Z", "level": "WARN", "component": "o.e.c.r.a.AllocationService", "cluster.name": "docker-cluster", "node.name": "elasticsearch-6f9bdbcb95-fgtgv", "message": "failing shard [failed shard, shard [logstash-alfresco-2020.07.17][0], node[n54V6RfvRu-hKRrNAaV9ew], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[INITIALIZING], a[id=v7IHeNemRU6mpPaDgytKgA], unassigned_info[[reason=ALLOCATION_FAILED], at[2021-01-16T05:45:52.220Z], failed_attempts[4], failed_nodes[[n54V6RfvRu-hKRrNAaV9ew]], delayed=false, details[failed shard on node [n54V6RfvRu-hKRrNAaV9ew]: failed recovery, failure RecoveryFailedException[[logstash-alfresco-2020.07.17][0]: Recovery failed on {elasticsearch-6f9bdbcb95-fgtgv}{n54V6RfvRu-hKRrNAaV9ew}{GjGMFfT0SkKzZEcmc53dbA}{10.42.21.96}{10.42.21.96:9300}{cdhilmrstw}{ml.machine_memory=4294967296, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: LockObtainFailedException[Lock held by another program: /usr/share/elasticsearch/data/nodes/0/indices/cgUkK37XSZGrzz1LyxF0vg/0/index/write.lock]; ], allocation_status[no_valid_shard_copy]], message [failed recovery], failure [RecoveryFailedException[[logstash-alfresco-2020.07.17][0]: Recovery failed on {elasticsearch-6f9bdbcb95-fgtgv}{n54V6RfvRu-hKRrNAaV9ew}{GjGMFfT0SkKzZEcmc53dbA}{10.42.21.96}{10.42.21.96:9300}{cdhilmrstw}{ml.machine_memory=4294967296, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: LockObtainFailedException[Lock held by another program: /usr/share/elasticsearch/data/nodes/0/indices/cgUkK37XSZGrzz1LyxF0vg/0/index/write.lock]; ], markAsStale [true]]", "cluster.uuid": "r-0GcXtHSy6tqhXAjLZS-g", "node.id": "n54V6RfvRu-hKRrNAaV9ew" ,
"stacktrace": ["org.elasticsearch.indices.recovery.RecoveryFailedException: [logstash-alfresco-2020.07.17][0]: Recovery failed on {elasticsearch-6f9bdbcb95-fgtgv}{n54V6RfvRu-hKRrNAaV9ew}{GjGMFfT0SkKzZEcmc53dbA}{10.42.21.96}{10.42.21.96:9300}{cdhilmrstw}{ml.machine_memory=4294967296, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}",
"at org.elasticsearch.index.shard.IndexShard.lambda$executeRecovery$21(IndexShard.java:2676) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.lambda$recoveryListener$6(StoreRecovery.java:355) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:328) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:96) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1894) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
"at java.lang.Thread.run(Thread.java:832) [?:?]",
"Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to recover from gateway",
"at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:441) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:98) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:325) ~[elasticsearch-7.10.1.jar:7.10.1]",
"... 8 more",
"Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: failed to create engine",
"at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:254) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:205) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1654) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1620) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:436) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:98) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:325) ~[elasticsearch-7.10.1.jar:7.10.1]",
"... 8 more",
"Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by another program: /usr/share/elasticsearch/data/nodes/0/indices/cgUkK37XSZGrzz1LyxF0vg/0/index/write.lock",
"at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:130) ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:35:28]",
"at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:35:28]",
"at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:35:28]",
"at org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105) ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:35:28]",
"at org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105) ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:35:28]",
"at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:923) ~[lucene-core-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:35:28]",
"at org.elasticsearch.index.engine.InternalEngine.createWriter(InternalEngine.java:2288) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.engine.InternalEngine.createWriter(InternalEngine.java:2276) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:247) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:205) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1654) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1620) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:436) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:98) ~[elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:325) ~[elasticsearch-7.10.1.jar:7.10.1]",
"... 8 more"] }

Response of _cat/shards api:

Response of _cat/indices api:

Please suggest

Thanks
Rahul

DavidTurner · January 17, 2021, 9:10am

As it says, the lock is held by another program. Usually this means you have two Elasticsearch processes running on the same data path.

rahulnama · January 17, 2021, 12:33pm

@DavidTurner

Is there a way to know the running elasticsearch processes that held the lock ?

We have 10-15 containers mounted to the same volume. And elasticsearch is running only in one of them.

Any suggestions ?

Best
Rahul

DavidTurner · January 17, 2021, 1:46pm

I'd use lsof or lslocks but I've no idea how well they work with containers.

rahulnama · January 17, 2021, 5:55pm

Thank you @DavidTurner

I've manually deleted all the write.lock files (present when a process is reading/writing to the folder) inside each index. Now everything looks good.

Thank you for your time

Best
Rahul

DavidTurner · January 17, 2021, 6:34pm

Don't ever delete anything from inside the data path.

Topic		Replies	Views
Failed to obtain node locks, t multiple nodes were started without increasing Elasticsearch elastic-stack-security	5	1569	August 23, 2019
Shard lock issue Elasticsearch	10	2264	January 29, 2023
Failed to start Elasticsearch. (code=exited, status=1/FAILURE) Elasticsearch elastic-stack-alerting	16	93879	February 6, 2020
Problem installation Elasticsearch - fatal exception while booting Elasticsearch java.lang.IllegalStateException: failed to obtain node locks Elasticsearch painless , runtime-fields	2	2561	October 8, 2023
Failed to obtain node locks Elasticsearch [8.0.1] Elasticsearch docker	1	2709	March 9, 2022

Elasticsearch: Failed to obtain node locks

Related topics