I have 5 data nodes, version 5.6.2 on Ubuntu 16.04. All nodes are occassionaly having a problem with the "node.lock" file, or perhaps that's just a symptom of some other problem. This results in shards failing to allocate.
[2017-11-01T09:15:13,494][WARN ][o.e.c.s.ClusterService ] [node01] failed to notify ClusterStateListener
org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2017-10-31T13:18:10Z, (lock=NativeFSLock(path=/opt/elasticsearch_indexes/i/nodes/0/node.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2017-10-31T13:18:10.034974Z))
at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:179) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
at org.elasticsearch.env.NodeEnvironment.assertEnvIsLocked(NodeEnvironment.java:941) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.env.NodeEnvironment.availableIndexFolders(NodeEnvironment.java:820) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.gateway.MetaStateService.loadIndicesStates(MetaStateService.java:89) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.gateway.DanglingIndicesState.findNewDanglingIndices(DanglingIndicesState.java:131) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.gateway.DanglingIndicesState.findNewAndAddDanglingIndices(DanglingIndicesState.java:116) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.gateway.DanglingIndicesState.processDanglingIndices(DanglingIndicesState.java:81) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.gateway.DanglingIndicesState.clusterChanged(DanglingIndicesState.java:185) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.cluster.service.ClusterService.lambda$publishAndApplyChanges$7(ClusterService.java:777) ~[elasticsearch-5.6.2.jar:5.6.2]
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) [?:1.8.0_131]
at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) [?:1.8.0_131]
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) [?:1.8.0_131]
at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:774) [elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-5.6.2.jar:5.6.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
When this happens, I stop elasticsearch service on that node, delete the node.lock file, and restart elasticsearch service. That gets shards allocating again for 12-24 hours before it happens again on another node.
Please help.