Underlying file changed by an external force


(xiaomage) #1

Using elasticsearch cluster( 3 servers) with kubernetes, but elasticsearch cluster can not wrok fine and log show as below:
[2018-06-27T05:48:31,798][WARN ][o.e.c.r.a.AllocationService] [sBJEWnS] failing shard [failed shard, shard [logstash-2018.06.27][4], node[sBJEWnS1R1qL7blJL9Wl0Q], [P], recovery_source[existing recovery], s[INITIALIZING], a[id=CSql7_M8QnmfnylNrN-LFQ], unassigned_info[[reason=ALLOCATION_FAILED], at[2018-06-27T05:48:30.059Z], failed_attempts[1], delayed=false, details[failed shard on node [sBJEWnS1R1qL7blJL9Wl0Q]: shard failure, reason [already closed by tragic event on the index writer], failure AlreadyClosedException[Underlying file changed by an external force at 2018-06-27T05:47:51.755Z,(lock=NativeFSLock(path=/data/nodes/0/indices/H9SdQlUeQluw6slOsBG7AQ/4/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2018-06-27T05:47:23.576Z))]], allocation_status[no_valid_shard_copy]], message [shard failure, reason [lucene commit failed]], failure [AlreadyClosedException[Underlying file changed by an external force at 2018-06-27T05:47:51.755Z, (lock=NativeFSLock(path=/data/nodes/0/indices/H9SdQlUeQluw6slOsBG7AQ/4/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2018-06-27T05:47:23.576Z))]], markAsStale [true]]
org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2018-06-27T05:47:51.755Z, (lock=NativeFSLock(path=/data/nodes/0/indices/H9SdQlUeQluw6slOsBG7AQ/4/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2018-06-27T05:47:23.576Z))
at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:191) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.write(Lucene70SegmentInfoFormat.java:276) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:547) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:491) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:557) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:673) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3246) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3458) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3423) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:2160) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:415) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:377) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:98) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1297) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:420) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1567) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$5(IndexShard.java:2020) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:625) ~[elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:844) [?:?]

Could anybody know root cause and thanks advance.


(David Turner) #2

On startup, the file in question had a last-modification time of 2018-06-27T05:47:23.576Z, but when later checked it had unexpectedly again been modified at 2018-06-27T05:47:51.755Z. This means that something outside of this node appears to be modifying the files to which it is supposed to have exclusive access.


(xiaomage) #3

@DavidTurner Thanks a lot for your answer, i used elasticsearch cluster with kubernetes, I checked on the server, Timezone on my host server is CDT, but timezone on container in pods is UTC. glusterfs is used to a filesystem to store the elasticsearch data, according to your answer, issue caused by different timezone, am i right?


(David Turner) #4

The first two Google hits for elasticsearch glusterfs lead to two serious-looking open GlusterFS issues that Elasticsearch triggers:

https://bugzilla.redhat.com/show_bug.cgi?id=1390050
https://bugzilla.redhat.com/show_bug.cgi?id=1379568

This may be one of them, or may be something else, but it seems worth rethinking your setup.

I don't think so. Both of the times I quoted are in UTC (note the trailing Z) and differ by 28 seconds, so it seems unlikely to be a timezone issue.


Elasticsearch shard unassigned and changed to red
Unassigned shards on ES 2.4.4
(xiaomage) #5

Got it, i will have a try. Thanks a lot for your great answer.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.