Node.lock file - Underlying file changed by an external force

I have 5 data nodes, version 5.6.2 on Ubuntu 16.04. All nodes are occassionaly having a problem with the "node.lock" file, or perhaps that's just a symptom of some other problem. This results in shards failing to allocate.


[2017-11-01T09:15:13,494][WARN ][o.e.c.s.ClusterService   ] [node01] failed to notify ClusterStateListener
org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2017-10-31T13:18:10Z, (lock=NativeFSLock(path=/opt/elasticsearch_indexes/i/nodes/0/node.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2017-10-31T13:18:10.034974Z))
        at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:179) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
        at org.elasticsearch.env.NodeEnvironment.assertEnvIsLocked(NodeEnvironment.java:941) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.env.NodeEnvironment.availableIndexFolders(NodeEnvironment.java:820) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.gateway.MetaStateService.loadIndicesStates(MetaStateService.java:89) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.gateway.DanglingIndicesState.findNewDanglingIndices(DanglingIndicesState.java:131) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.gateway.DanglingIndicesState.findNewAndAddDanglingIndices(DanglingIndicesState.java:116) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.gateway.DanglingIndicesState.processDanglingIndices(DanglingIndicesState.java:81) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.gateway.DanglingIndicesState.clusterChanged(DanglingIndicesState.java:185) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.cluster.service.ClusterService.lambda$publishAndApplyChanges$7(ClusterService.java:777) ~[elasticsearch-5.6.2.jar:5.6.2]
        at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) [?:1.8.0_131]
        at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) [?:1.8.0_131]
        at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) [?:1.8.0_131]
        at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:774) [elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-5.6.2.jar:5.6.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

When this happens, I stop elasticsearch service on that node, delete the node.lock file, and restart elasticsearch service. That gets shards allocating again for 12-24 hours before it happens again on another node.

Please help.

For those who have this problem, the solution was apparently to give the servers (and JVM) more RAM.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.