Issue after disk problem: java.lang.IllegalArgumentException: requested snapshot generation [10] is not available. Min referenced generation is [11]

Hi. We've had a disk problem last night, causing a whole slew of issues in our Elasticsearch cluster. We've been able to resolve all of them but one: 1 (replica-less) shard is problematic. However, this is an issue we haven't seen yet, and we haven't found any reference to it nor any way of fixing it. The "bottom" error seems to be the one in the title, i.e.

    Caused by: java.lang.IllegalArgumentException: requested snapshot generation [10] is not available. Min referenced generation is [11]

Here's the full error log:

   [2021-02-15T12:42:57,479][WARN ][o.e.i.c.IndicesClusterStateService] [Elastic1] [[index_1084][2]] marking and sending shard failed due to [failed recovery]
   org.elasticsearch.indices.recovery.RecoveryFailedException: [index_1084][2]: Recovery failed on {Elastic1}{DYEyUuc6RKKbmLgErwcOfg}{ZN5-3mCgQR-AXt8SKJ3vRQ}{10.0.2.52}{10.0.2.52:9300}{ml.machine_memory=37965348864, xpack.installed=true, box_type=warm, ml.max_open_jobs=20, ml.enabled=true}
           at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2396) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:708) [elasticsearch-6.8.13.jar:6.8.13]
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
           at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
   Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to recover from gateway
           at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:445) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1739) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2392) ~[elasticsearch-6.8.13.jar:6.8.13]
           ... 4 more
   Caused by: org.elasticsearch.index.engine.EngineException: failed to recover from translog
           at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:455) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:425) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:111) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1449) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:440) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1739) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2392) ~[elasticsearch-6.8.13.jar:6.8.13]
           ... 4 more
   Caused by: java.lang.IllegalArgumentException: requested snapshot generation [10] is not available. Min referenced generation is [11]
           at org.elasticsearch.index.translog.Translog.newSnapshotFromGen(Translog.java:614) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:451) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:425) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:111) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1449) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:440) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1739) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2392) ~[elasticsearch-6.8.13.jar:6.8.13]
           ... 4 more

We've tried the elasticsearch-translog tool, but to no avail:

    >> Translog is clean at /var/lib/elasticsearch/cluster/nodes/0/indices/DyIuag3dQk-scfRrAPNBfw/2/translog                                                                                    
                                                                                                                                                                                                 
    Exception in thread "main" ElasticsearchException[Shard does not seem to be corrupted at /var/lib/elasticsearch/cluster/nodes/0/indices/DyIuag3dQk-scfRrAPNBfw/2]

Anyone has seen this before and has any idea how to fix it?

I think this can only happen if your storage system re-orders operations in a way that it's not allowed to do (i.e. across fsync() calls). Perhaps this is a consequence of the "disk problem" you mentioned. It's pretty bad, however, who knows what other things it might have done out-of-order that aren't so easily detected.

I would recommend restoring this index from a recent snapshot.

I would recommend restoring this index from a recent snapshot.

If only :sweat_smile: Yeah the "disk problem" was essentially a hypervisor suddenly losing access to its storage, so weird stuff could have happened.

So you're essentially saying this shard is lost, considering we have neither replicas nor snapshots? There's no way of getting the data back? And if that's the case, is there a way to not lose the data from the other shards of the index?

Hmm that in itself should not have resulted in out-of-order operations, it would have acted more like a power outage. I think fsync() might not be working properly on your system.

I can't think of one, sorry, at least not in 6.8. In 7.x there's a --truncate-clean-translog option to bypass the "translog is clean" check.

You could reindex their contents into a new index, although I'm not sure how useful this will be since you will be missing some random 1/Nth of your documents.

Thanks a lot for your answers! We'll live without this data then.

I now have an additional question to you following this quote:

Are documents "cut", and potentially split between shards? I always thought that one document was fully stored in one shard (and even more, in one segment).

That's correct, you won't find any documents to be partially missing.

Oh yeah that makes sense then. Again, thanks for your answers, and have a nice day!

1 Like