Hi
Background:
I have set up a 3-node elasticsearch V5.4 cluster on AWS EC2 instances. The 3 instances are part of an auto-scaling group (max - min 3) behind an ELB. Each node is allocated a persistent EBS volume which stores the Elasticsearch data and logs. When a node in the ASG terminates, a new one spins up and the same EBS volume is re-attached on startup.
The problem I'm finding is that when the new instance starts and the volume is attached, the ES cluster remains in a yellow or red state due to unassigned shards In the logs I see the following error related to the node.lock file.
Has anyone ever come across this? Would it be safe to delete the node.lock file in the EC2 user data startup scripts before starting the elasticsearch service?
[2017-06-02T09:14:39,593][WARN ][o.e.c.a.s.ShardStateAction] [ip-10-8-24-59] [sssit-filebeat-2017.06.02][0] received shard failed for shard id [[sssit-filebeat-2017.06.02][0]], allocation id [HIPr-i3mSZ2GbZq9PZLQTQ], primary term [0], message [failed to create shard], failure [AlreadyClosedException[Underlying file changed by an external force at 2017-06-02T09:05:53.072442Z, (lock=NativeFSLock(path=/data/elasticsearch/nodes/0/node.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2017-06-02T09:13:37.268114Z))]]
org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2017-06-02T09:05:53.072442Z, (lock=NativeFSLock(path=/data/elasticsearch/nodes/0/node.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2017-06-02T09:13:37.268114Z))
at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:179) ~[lucene-core-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:40:22]