We have an Elasticsearch 7.9.1 cluster that suddenly started to fail snapshotting with a strange exception. Here's the details for one of the machines in the cluster:
$ uname -a
Linux [REDACTED] 4.4.0-1095-aws #106-Ubuntu SMP Wed Sep 18 13:33:48 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ curl localhost:9200
{
"name" : "[REDACTED]",
"cluster_name" : "[REDACTED]",
"cluster_uuid" : "kHUFcGplQD-WbnpKmaO_9g",
"version" : {
"number" : "7.9.1",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "083627f112ba94dffc1232e8b42b73492789ef91",
"build_date" : "2020-09-01T21:22:21.964974Z",
"build_snapshot" : false,
"lucene_version" : "8.6.2",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
Here's the exception details:
java.lang.UnsupportedOperationException: Old formats can't be used for writing
at org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.write(Lucene70SegmentInfoFormat.java:273) ~[lucene-backward-codecs-8.6.2.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:54:32]
at org.elasticsearch.snapshots.SourceOnlySnapshot.syncSegment(SourceOnlySnapshot.java:264) ~[?:?]
at org.elasticsearch.snapshots.SourceOnlySnapshot.syncSnapshot(SourceOnlySnapshot.java:108) ~[?:?]
at org.elasticsearch.snapshots.SourceOnlySnapshotRepository.snapshotShard(SourceOnlySnapshotRepository.java:170) ~[?:?]
at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:340) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.snapshots.SnapshotShardsService.lambda$startNewShards$1(SnapshotShardsService.java:256) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:651) [elasticsearch-7.9.1.jar:7.9.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
This exception is coming directly from Lucene (see the line in question here).
This seems to have started happening when we updated to version 7.9.0. I've tried deleting the most recent snapshot, starting over with a fresh repository, and updating to 7.9.1
, but none of this changed the error.
The index in question has segments with the following Lucene versions:
8.4.0
8.5.1
8.6.0
8.6.2 (this makes up the vast majority of segments)
All the segments in versions < 8.6.2
are committed.
The oldest snapshot in our repository was created in ES version 7.4.0
, but given that using a completely blank repository doesn't work, I'm inclined to believe this is a different issue.
Trying to snapshot again results in the following exception:
java.nio.file.FileAlreadyExistsException: /mnt/istore/elasticsearch/data/nodes/0/indices/vDAnJnfiTkWOZ7nOv4GMDw/126/_snapshot/_phg7.fnm
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:94) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219) ~[?:?]
at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:478) ~[?:?]
at java.nio.file.Files.newOutputStream(Files.java:224) ~[?:?]
at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:410) ~[lucene-core-8.6.2.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:53:36]
at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:406) ~[lucene-core-8.6.2.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:53:36]
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:254) ~[lucene-core-8.6.2.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:53:36]
at org.elasticsearch.snapshots.SourceOnlySnapshot$LinkedFilesDirectory.createOutput(SourceOnlySnapshot.java:352) ~[?:?]
at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) ~[lucene-core-8.6.2.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:53:36]
at org.apache.lucene.codecs.lucene60.Lucene60FieldInfosFormat.write(Lucene60FieldInfosFormat.java:272) ~[lucene-core-8.6.2.jar:8.6.2 016993b65e393b58246d54e8ddda9f56a453eb0e - ivera - 2020-08-26 10:53:36]
at org.elasticsearch.snapshots.SourceOnlySnapshot.syncSegment(SourceOnlySnapshot.java:229) ~[x-pack-core-7.9.1.jar:7.9.1]
at org.elasticsearch.snapshots.SourceOnlySnapshot.syncSnapshot(SourceOnlySnapshot.java:108) ~[x-pack-core-7.9.1.jar:7.9.1]
at org.elasticsearch.snapshots.SourceOnlySnapshotRepository.snapshotShard(SourceOnlySnapshotRepository.java:170) [x-pack-core-7.9.1.jar:7.9.1]
at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:340) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.snapshots.SnapshotShardsService.lambda$startNewShards$1(SnapshotShardsService.java:256) [elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:651) [elasticsearch-7.9.1.jar:7.9.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
This leads me to believe that this exception isn't handled properly, so it results in some bad leftover state.
At this point, it seems like my only hope is to rebuild the index, but it's quite large and that's something I'd like to avoid. Is it possible this is a bug introduced in a recent version?
Thanks,
Olivia
Academia.edu