Elasticsearch cluster unable to recover (unavailable_shards_exception)

Vincenzo_Buonagura · April 6, 2018, 8:03am

Hi all,
we are using the ELK stack to have realtime log in our complex platform.

During the night we had an issue with Logstash that was not able to write in Elasticsearch because one of the shards was unable to recover.

We are using the complete stack with version 5.6.

These are the logs we have in Elasticsearch:

[2018-04-05T23:48:17,185][WARN ][o.e.c.a.s.ShardStateAction] [870632-SPWLOG01] [iislog-2018.04.05][3] received shard failed for shard id [[iislog-2018.04.05][3]], allocation id [4Q-GWqd-RRKvht8nML4FrA], primary term [0], message [shard failure, reason [lucene commit failed]], failure [FileSystemException[D:\Elasticsearch\data\nodes\0\indices\CKrJStXdRqeMjlmeTmrouA\3\index_5qc_Lucene50_0.doc: The process cannot access the file because it is being used by another process.
]]
java.nio.file.FileSystemException: D:\Elasticsearch\data\nodes\0\indices\CKrJStXdRqeMjlmeTmrouA\3\index_5qc_Lucene50_0.doc: The process cannot access the file because it is being used by another process.

at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[?:?]
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) ~[?:?]
at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(WindowsFileSystemProvider.java:115) ~[?:?]
at java.nio.channels.FileChannel.open(FileChannel.java:287) ~[?:1.8.0_144]
at java.nio.channels.FileChannel.open(FileChannel.java:335) ~[?:1.8.0_144]
at org.apache.lucene.util.IOUtils.fsync(IOUtils.java:471) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.FSDirectory.fsync(FSDirectory.java:327) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.FSDirectory.sync(FSDirectory.java:285) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.FilterDirectory.sync(FilterDirectory.java:83) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.FilterDirectory.sync(FilterDirectory.java:83) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.store.LockValidatingDirectoryWrapper.sync(LockValidatingDirectoryWrapper.java:68) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4724) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3085) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3244) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3207) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:1576) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1062) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:253) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:221) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:92) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:1033) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:987) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:360) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:90) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:257) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:88) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1236) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$1(IndexShard.java:1484) ~[elasticsearch-5.6.0.jar:5.6.0]
at

[2018-04-05T23:48:31,177][WARN ][o.e.i.e.Engine ] [870632-SPWLOG01] [iislog-2018.04.05][3] tried to fail engine but engine is already failed. ignoring. [failed to recover from translog]
org.elasticsearch.index.engine.FlushFailedEngineException: Flush failed
at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1069) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:253) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:221) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:92) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:1033) ~[elasticsearch-5.6.0.jar:5.6.0]
at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:987) ~[elasticsearch-5.6.0.jar:5.6.0]

Caused by: java.nio.file.FileSystemException: D:\Elasticsearch\data\nodes\0\indices\CKrJStXdRqeMjlmeTmrouA\3\index_8tv_Lucene50_0.pos: The process cannot access the file because it is being used by another process.

And these are the logs in Logstash

[2018-04-06T00:00:48,722][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[iislog-2018.04.05][3] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[iislog-2018.04.05][3]] containing [18] requests]"})
[2018-04-06T00:00:48,722][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[iislog-2018.04.05][3] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[iislog-2018.04.05][3]] containing [18] requests]"})

Did someone had this issue already?

ehmeier · April 22, 2018, 3:25pm

Did you get any input on this? We are seeing the same issue in our application with Elasticsearch 5.1.1 on Windows Server

vamsi67 · May 1, 2018, 6:51pm

We are having the same issue on elastic 5.5.2 where elastic status goes to red with the same error message .

elasticsearch-5.5.2\data\nodes\0\indices\LkQe3bguSAeW2AImBxlTzQ\3\index_9fxw_Lucene50_0.doc: The process cannot access the file because it is being used by another process.

Did any of you guys figured out what was causing that ?

ehmeier · May 4, 2018, 5:40pm

Not yet. We eliminated a security scanner, so the problems come less frequently, but it still fails about once per day.

Which version of elasticsearch are you using? I assume, your server runs on Windows, correct?

system · June 1, 2018, 5:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch is unable to recover Elasticsearch	4	5646	July 5, 2017
Failed to start shard Elasticsearch	2	452	July 6, 2017
Indexing/shard failure Elasticsearch	5	998	July 6, 2017
Shard faliure Elasticsearch	6	576	July 6, 2017
Exception when a new record comes to elastic search from logstash Elasticsearch	2	809	July 5, 2017

Elasticsearch cluster unable to recover (unavailable_shards_exception)

Related topics