Issue after disk problem: java.lang.IllegalArgumentException: requested snapshot generation [10] is not available. Min referenced generation is [11]

nico127 · February 15, 2021, 12:02pm

Hi. We've had a disk problem last night, causing a whole slew of issues in our Elasticsearch cluster. We've been able to resolve all of them but one: 1 (replica-less) shard is problematic. However, this is an issue we haven't seen yet, and we haven't found any reference to it nor any way of fixing it. The "bottom" error seems to be the one in the title, i.e.

    Caused by: java.lang.IllegalArgumentException: requested snapshot generation [10] is not available. Min referenced generation is [11]

Here's the full error log:

   [2021-02-15T12:42:57,479][WARN ][o.e.i.c.IndicesClusterStateService] [Elastic1] [[index_1084][2]] marking and sending shard failed due to [failed recovery]
   org.elasticsearch.indices.recovery.RecoveryFailedException: [index_1084][2]: Recovery failed on {Elastic1}{DYEyUuc6RKKbmLgErwcOfg}{ZN5-3mCgQR-AXt8SKJ3vRQ}{10.0.2.52}{10.0.2.52:9300}{ml.machine_memory=37965348864, xpack.installed=true, box_type=warm, ml.max_open_jobs=20, ml.enabled=true}
           at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2396) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:708) [elasticsearch-6.8.13.jar:6.8.13]
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
           at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
   Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to recover from gateway
           at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:445) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1739) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2392) ~[elasticsearch-6.8.13.jar:6.8.13]
           ... 4 more
   Caused by: org.elasticsearch.index.engine.EngineException: failed to recover from translog
           at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:455) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:425) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:111) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1449) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:440) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1739) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2392) ~[elasticsearch-6.8.13.jar:6.8.13]
           ... 4 more
   Caused by: java.lang.IllegalArgumentException: requested snapshot generation [10] is not available. Min referenced generation is [11]
           at org.elasticsearch.index.translog.Translog.newSnapshotFromGen(Translog.java:614) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:451) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:425) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:111) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1449) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:440) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1739) ~[elasticsearch-6.8.13.jar:6.8.13]
           at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2392) ~[elasticsearch-6.8.13.jar:6.8.13]
           ... 4 more

We've tried the elasticsearch-translog tool, but to no avail:

    >> Translog is clean at /var/lib/elasticsearch/cluster/nodes/0/indices/DyIuag3dQk-scfRrAPNBfw/2/translog                                                                                    
                                                                                                                                                                                                 
    Exception in thread "main" ElasticsearchException[Shard does not seem to be corrupted at /var/lib/elasticsearch/cluster/nodes/0/indices/DyIuag3dQk-scfRrAPNBfw/2]

Anyone has seen this before and has any idea how to fix it?

DavidTurner · February 15, 2021, 12:41pm

I think this can only happen if your storage system re-orders operations in a way that it's not allowed to do (i.e. across fsync() calls). Perhaps this is a consequence of the "disk problem" you mentioned. It's pretty bad, however, who knows what other things it might have done out-of-order that aren't so easily detected.

I would recommend restoring this index from a recent snapshot.

nico127 · February 15, 2021, 1:25pm

I would recommend restoring this index from a recent snapshot.

If only Yeah the "disk problem" was essentially a hypervisor suddenly losing access to its storage, so weird stuff could have happened.

So you're essentially saying this shard is lost, considering we have neither replicas nor snapshots? There's no way of getting the data back? And if that's the case, is there a way to not lose the data from the other shards of the index?

DavidTurner · February 15, 2021, 3:46pm

Hmm that in itself should not have resulted in out-of-order operations, it would have acted more like a power outage. I think fsync() might not be working properly on your system.

I can't think of one, sorry, at least not in 6.8. In 7.x there's a --truncate-clean-translog option to bypass the "translog is clean" check.

You could reindex their contents into a new index, although I'm not sure how useful this will be since you will be missing some random 1/Nth of your documents.

nico127 · February 16, 2021, 1:18pm

Thanks a lot for your answers! We'll live without this data then.

I now have an additional question to you following this quote:

Are documents "cut", and potentially split between shards? I always thought that one document was fully stored in one shard (and even more, in one segment).

DavidTurner · February 16, 2021, 1:37pm

That's correct, you won't find any documents to be partially missing.

nico127 · February 17, 2021, 3:01pm

Oh yeah that makes sense then. Again, thanks for your answers, and have a nice day!

system · March 17, 2021, 3:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error while creating Snapshots Elasticsearch	2	208	November 22, 2022
IllegalArgumentException[Unknown client name [gcs] Elasticsearch snapshot-and-restore	2	573	July 14, 2022
This Mornings Snapshot Elasticsearch	7	350	July 6, 2017
Getting "UnavailableShardsException" sometimes only Elasticsearch	8	4865	July 5, 2017
Failed to update snapshot in repository Elasticsearch snapshot-and-restore	1	890	April 25, 2022

Issue after disk problem: java.lang.IllegalArgumentException: requested snapshot generation [10] is not available. Min referenced generation is [11]

Related topics