Elasticsearch Fills Logs with Error Messages When Shard Fails to Recover

tenhavenator · September 19, 2016, 10:12pm

Elasticsearch version: 2.3.1

I am seeing an issue where Elasticsearch is filling up disk space (over 40GB) with error logging. Its the same exception over and over again:

[2016-08-16 15:59:57,827][WARN ][cluster.action.shard ] [node] [geocortex.core.roles.elasticsearch.watche
0] received shard failed for target shard [[geocortex.core.roles.elasticsearch.watcher][0], node[btC68MOzSHKQPa-ANpR
], [P], v[35], s[INITIALIZING], a[id=8qt4DOJkQVahfclZFGjqmg], unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-08
T15:59:50.811Z]]], indexUUID [K5GAD2zsSbaQ9GhWnImH2Q], message [failed recovery], failure [IndexShardRecoveryExcepti
failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: EOFExcept
; ]
[geocortex.core.roles.elasticsearch.watcher][[geocortex.core.roles.elasticsearch.watcher][0]] IndexShardRecoveryExce
on[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: EOFExc
ion;
at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:250)
at org.elasticsearch.index.shard.StoreRecoveryService.access$100(StoreRecoveryService.java:56)
at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: [geocortex.core.roles.elasticsearch.watcher][[geocortex.core.roles.elasticsearch.watcher][0]] EngineCreat
FailureException[failed to create engine]; nested: EOFException;
at org.elasticsearch.index.engine.InternalEngine.(InternalEngine.java:155)
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25)
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1515)
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1499)
at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:972)
at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:944)
at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:241)
... 5 more
Caused by: java.io.EOFException
at org.apache.lucene.store.InputStreamDataInput.readByte(InputStreamDataInput.java:37)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:101)
at org.apache.lucene.store.DataInput.readLong(DataInput.java:157)
at org.elasticsearch.index.translog.Checkpoint.(Checkpoint.java:54)
at org.elasticsearch.index.translog.Checkpoint.read(Checkpoint.java:83)
at org.elasticsearch.index.translog.Translog.recoverFromFiles(Translog.java:337)
at org.elasticsearch.index.translog.Translog.(Translog.java:179)
at org.elasticsearch.index.engine.InternalEngine.openTranslog(InternalEngine.java:208)
at org.elasticsearch.index.engine.InternalEngine.(InternalEngine.java:151)
... 11 more

The error happens as soon as Elasticsearch starts and tries to recover the indices. It looks to me like an error while reading the translog fie. Unfortunately, I do not have the translog file anymore. Any one have any idea why this is happening?

jpountz · September 20, 2016, 10:50am

It looks like your translog file has been truncated somehow, which prevents Elasticsearch from performing the recovery. If you don't mind losing some data, you could remove the translog files from disk and restart elasticsearch.

tenhavenator · September 21, 2016, 3:47pm

I can reproduce the issue by manually truncating one of the translog .ckp files. Is there a way to prevent Elasticsearch from filling the log files with GB of data? Possibly stop trying to recover the index after it fails x number of times?

Topic		Replies	Views
Failed to recover from translog Elasticsearch	3	2063	July 5, 2017
Elasticsearch is unable to recover Elasticsearch	4	5635	July 5, 2017
Shard recovery fails after resizing Google Cloud Platform's Persistent Disk Elasticsearch	13	1550	October 23, 2017
Failed to start shard Elasticsearch	2	445	July 6, 2017
Out of space issue Elasticsearch	3	4393	July 5, 2017

Elasticsearch Fills Logs with Error Messages When Shard Fails to Recover

Related topics