Elasticsearch cluster unable to recover (unavailable_shards_exception)


(Vincenzo Buonagura) #1

Hi all,
we are using the ELK stack to have realtime log in our complex platform.

During the night we had an issue with Logstash that was not able to write in Elasticsearch because one of the shards was unable to recover.

We are using the complete stack with version 5.6.

These are the logs we have in Elasticsearch:

[2018-04-05T23:48:17,185][WARN ][o.e.c.a.s.ShardStateAction] [870632-SPWLOG01] [iislog-2018.04.05][3] received shard failed for shard id [[iislog-2018.04.05][3]], allocation id [4Q-GWqd-RRKvht8nML4FrA], primary term [0], message [shard failure, reason [lucene commit failed]], failure [FileSystemException[D:\Elasticsearch\data\nodes\0\indices\CKrJStXdRqeMjlmeTmrouA\3\index_5qc_Lucene50_0.doc: The process cannot access the file because it is being used by another process.
]]
java.nio.file.FileSystemException: D:\Elasticsearch\data\nodes\0\indices\CKrJStXdRqeMjlmeTmrouA\3\index_5qc_Lucene50_0.doc: The process cannot access the file because it is being used by another process.

at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[?:?]
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) ~[?:?]
at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(WindowsFileSystemProvider.java:115) ~[?:?]
at java.nio.channels.FileChannel.open(FileChannel.java:287) ~[?:1.8.0_144]
at java.nio.channels.FileChannel.open(FileChannel.java:335) ~[?:1.8.0_144]
at org.apache.lucene.util.IOUtils.fsync(IOUtils.java:471) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.FSDirectory.fsync(FSDirectory.java:327) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.FSDirectory.sync(FSDirectory.java:285) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.FilterDirectory.sync(FilterDirectory.java:83) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.FilterDirectory.sync(FilterDirectory.java:83) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.LockValidatingDirectoryWrapper.sync(LockValidatingDirectoryWrapper.java:68) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4724) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3085) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3244) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3207) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:1576) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1062) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:253) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:221) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:92) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:1033) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:987) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:360) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:90) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:257) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:88) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1236) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$1(IndexShard.java:1484) ~[elasticsearch-5.6.0.jar:5.6.0]
at 

[2018-04-05T23:48:31,177][WARN ][o.e.i.e.Engine ] [870632-SPWLOG01] [iislog-2018.04.05][3] tried to fail engine but engine is already failed. ignoring. [failed to recover from translog]
org.elasticsearch.index.engine.FlushFailedEngineException: Flush failed
at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1069) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:253) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:221) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:92) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:1033) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:987) ~[elasticsearch-5.6.0.jar:5.6.0]

Caused by: java.nio.file.FileSystemException: D:\Elasticsearch\data\nodes\0\indices\CKrJStXdRqeMjlmeTmrouA\3\index_8tv_Lucene50_0.pos: The process cannot access the file because it is being used by another process.

And these are the logs in Logstash

[2018-04-06T00:00:48,722][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[iislog-2018.04.05][3] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[iislog-2018.04.05][3]] containing [18] requests]"})
[2018-04-06T00:00:48,722][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[iislog-2018.04.05][3] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[iislog-2018.04.05][3]] containing [18] requests]"})

Did someone had this issue already?


(Erich Meier) #2

Did you get any input on this? We are seeing the same issue in our application with Elasticsearch 5.1.1 on Windows Server


(vamsi) #3

We are having the same issue on elastic 5.5.2 where elastic status goes to red with the same error message .

elasticsearch-5.5.2\data\nodes\0\indices\LkQe3bguSAeW2AImBxlTzQ\3\index_9fxw_Lucene50_0.doc: The process cannot access the file because it is being used by another process.

Did any of you guys figured out what was causing that ?


(Erich Meier) #4

Not yet. We eliminated a security scanner, so the problems come less frequently, but it still fails about once per day.

Which version of elasticsearch are you using? I assume, your server runs on Windows, correct?


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.