Hi,
Our application uses elasticsearch v7.9.2. Each instance of the application creates three separate indexes for three different types of data.
One customer instance was working fine until the weekend, when a scheduled server update/reboot took place. After this, one out of the three indexes stopped working. Looking at the logs, I can see lots of the following types of statements:
marking and sending shard failed due to [failed recovery]
<snip>
failing shard [failed shard, shard [10503823_<index>][0], node[Av0B-jO6SDiKmHnkwx7qig], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[INITIALIZING], a[id=_XWXbYaTQL-wEdhmSGNvVQ], unassigned_info[[reason=CLUSTER_RECOVERED], at[2022-11-08T23:28:12.995Z], delayed=false, allocation_status[fetching_shard_data]], message [failed recovery], failure [RecoveryFailedException[[10503823_resolve_adbnsw_prd_resolve_case][0]: Recovery failed on {RESAPPP12}{Av0B-jO6SDiKmHnkwx7qig}{VjBNPmxoREOTuaSf1MTCVQ}{127.0.0.1}{127.0.0.1:9300}{dilmrt}{ml.machine_memory=17179332608, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: IndexFormatTooOldException[Format version is not supported (resource BufferedChecksumIndexInput(NIOFSIndexInput(path="C:\elasticsearch-7.9.2\data\nodes\0\indices\KwoPnMtRSw-n-jaYqKxLEA\0\index\segments_4yy"))): 540031034 (needs to be between 1071082519 and 1071082519). This version of Lucene only supports indexes created with release 6.0 and later.]; ], markAsStale [true]]
After some more digging, I found the restart in the logs, and the following appears immediately after the startup logs:
[o.e.i.c.IndicesClusterStateService] [RESAPPP12] [10503823_<index>][0] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [10503823_<index>][0]: Recovery failed on {<server>}{Av0B-jO6SDiKmHnkwx7qig}{oFE6oHqLR-yiIQAJe99b3w}{127.0.0.1}{127.0.0.1:9300}{dilmrt}{ml.machine_memory=17179332608, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}
<snip>at org.elasticsearch.index.shard.IndexShard.lambda$executeRecovery$21(IndexShard.java:2665) [elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) [elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoveryListener$6(StoreRecovery.java:355) [elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) [elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:328) [elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:96) [elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1883) [elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) [elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) [elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.2.jar:7.9.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to recover from gateway
at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:441) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:98) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:325) ~[elasticsearch-7.9.2.jar:7.9.2]
... 8 more
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: failed to create engine
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:243) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:193) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1643) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1609) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:436) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:98) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:325) ~[elasticsearch-7.9.2.jar:7.9.2]
... 8 more
Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(NIOFSIndexInput(path="C:\elasticsearch-7.9.2\data\nodes\0\indices\KwoPnMtRSw-n-jaYqKxLEA\0\index\segments_4yy"))): 540031034 (needs to be between 1071082519 and 1071082519). This version of Lucene only supports indexes created with release 6.0 and later.
at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:307) ~[lucene-core-8.6.2.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:53:36]
at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291) ~[lucene-core-8.6.2.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:53:36]
at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:277) ~[lucene-core-8.6.2.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:53:36]
at org.elasticsearch.index.store.Store.trimUnsafeCommits(Store.java:1513) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.engine.InternalEngine.trimUnsafeCommits(InternalEngine.java:2817) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:220) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:193) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1643) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1609) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:436) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:98) ~[elasticsearch-7.9.2.jar:7.9.2]
at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:325) ~[elasticsearch-7.9.2.jar:7.9.2]
... 8 more
I'm curious about the statement Format version is not supported ... (needs to be between 1071082519 and 1071082519). This version of Lucene only supports indexes created with release 6.0 and later.
My questions:
- Would a purge and re-index of this index resolve the problem?
- What could have caused this issue? As far as I'm aware, this index was created with elasticsearch v7.9.2, and that is the only version that's ever been installed on this server. Additionally, why would this only impact one index out of the three that the system uses (the others are still being updated and can still be read)?
Many thanks for any help that anyone can offer.