Geoip_databases index allocation failed

Im learning and playing around with Elasticsearch. Just setup a new local test cluster with 3 nodes on kubernetes (Elasticsearch 8.1.0) and also set up kibana 8.1.0. In kubernetes its a StateFulset with storage on NFS. The cluster health says "red" because there seems to be an unassigned shard:

.geoip_databases   p      UNASSIGNED  ALLOCATION_FAILED

taking a closer look with: _cluster/allocation/explain?pretty

{
    "note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
    "index": ".geoip_databases",
    "shard": 0,
    "primary": true,
    "current_state": "unassigned",
    "unassigned_info": {
        "reason": "ALLOCATION_FAILED",
        "at": "2022-03-27T00:17:31.459Z",
        "failed_allocation_attempts": 1,
        "details": "failed shard on node [EUnT_WGpR_iARG1zED5l8w]: shard failure, reason [corrupt file (source: [index id[GeoLite2-City.mmdb_39_1648340246151] origin[PRIMARY] seq#[58]])], failure org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed\n\tat org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:907)\n\tat org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:920)\n\tat org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1527)\n\tat org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1815)\n\tat org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1468)\n\tat org.elasticsearch.index.engine.InternalEngine.addDocs(InternalEngine.java:1256)\n\tat org.elasticsearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1192)\n\tat org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:995)\n\tat org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:1040)\n\tat org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:973)\n\tat org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:891)\n\tat org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:320)\n\tat org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:185)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:250)\n\tat org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:131)\n\tat org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:70)\n\tat org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:210)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:776)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.lang.Thread.run(Thread.java:833)\nCaused by: java.io.IOException: read past EOF: NIOFSIndexInput(path=\"/usr/share/elasticsearch/data/indices/VHZCPdAnT7Wc-giJ54Hmow/0/index/_d.fdt\") buffer: java.nio.HeapByteBuffer[pos=0 lim=16 cap=1024] chunkLen: 16 end: 24854279: NIOFSIndexInput(path=\"/usr/share/elasticsearch/data/indices/VHZCPdAnT7Wc-giJ54Hmow/0/index/_d.fdt\")\n\tat org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:200)\n\tat org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:291)\n\tat org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:55)\n\tat org.apache.lucene.codecs.CodecUtil.readBEInt(CodecUtil.java:667)\n\tat org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:582)\n\tat org.apache.lucene.codecs.CodecUtil.retrieveChecksum(CodecUtil.java:534)\n\tat org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.<init>(Lucene90CompressingStoredFieldsReader.java:159)\n\tat org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsReader(Lucene90CompressingStoredFieldsFormat.java:133)\n\tat org.apache.lucene.codecs.lucene90.Lucene90StoredFieldsFormat.fieldsReader(Lucene90StoredFieldsFormat.java:136)\n\tat org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:134)\n\tat org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:89)\n\tat org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:179)\n\tat org.apache.lucene.index.BufferedUpdatesStream$SegmentState.<init>(BufferedUpdatesStream.java:295)\n\tat org.apache.lucene.index.IndexWriter.openSegmentStates(IndexWriter.java:6151)\n\tat org.apache.lucene.index.IndexWriter.forceApply(IndexWriter.java:5925)\n\tat org.apache.lucene.index.IndexWriter.tryApply(IndexWriter.java:5859)\n\tat org.apache.lucene.index.IndexWriter.lambda$publishFrozenUpdates$10(IndexWriter.java:2762)\n\tat org.apache.lucene.index.IndexWriter$EventQueue.processEventsInternal(IndexWriter.java:323)\n\tat org.apache.lucene.index.IndexWriter$EventQueue.processEvents(IndexWriter.java:312)\n\tat org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5702)\n\tat org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:575)\n\tat org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:380)\n\tat org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:354)\n\tat org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:344)\n\tat org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112)\n\tat org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170)\n\tat org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:48)\n\tat org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:27)\n\tat org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167)\n\tat org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240)\n\tat org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:354)\n\tat org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:334)\n\tat org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167)\n\tat org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213)\n\tat org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1817)\n\tat org.elasticsearch.index.engine.InternalEngine.maybeRefresh(InternalEngine.java:1796)\n\tat org.elasticsearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:3777)\n\tat org.elasticsearch.index.IndexService.maybeRefreshEngine(IndexService.java:911)\n\tat org.elasticsearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1037)\n\tat org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:133)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:717)\n\t... 3 more\n\tSuppressed: org.apache.lucene.index.CorruptIndexException: checksum passed (c0b70d67). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path=\"/usr/share/elasticsearch/data/indices/VHZCPdAnT7Wc-giJ54Hmow/0/index/_d.fdm\")))\n\t\tat org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:500)\n\t\tat org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.<init>(Lucene90CompressingStoredFieldsReader.java:208)\n\t\t... 37 more\nCaused by: java.io.EOFException: read past EOF: NIOFSIndexInput(path=\"/usr/share/elasticsearch/data/indices/VHZCPdAnT7Wc-giJ54Hmow/0/index/_d.fdt\") buffer: java.nio.HeapByteBuffer[pos=0 lim=16 cap=1024] chunkLen: 16 end: 24854279\n\tat org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:182)\n\t... 43 more\n",
        "last_allocation_status": "no_valid_shard_copy"
    },
    "can_allocate": "no_valid_shard_copy",
    "allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
    "node_allocation_decisions": [
        {
            "node_id": "EUnT_WGpR_iARG1zED5l8w",
            "node_name": "es-cluster-1",
            "transport_address": "10.32.0.4:9300",
            "node_attributes": {
                "ml.machine_memory": "8282968064",
                "xpack.installed": "true",
                "ml.max_jvm_size": "536870912"
            },
            "node_decision": "no",
            "store": {
                "in_sync": true,
                "allocation_id": "rKTu-HiAQHGbJETtDOQHkw",
                "store_exception": {
                    "type": "corrupt_index_exception",
                    "reason": "failed engine (reason: [corrupt file (source: [index id[GeoLite2-City.mmdb_39_1648340246151] origin[PRIMARY] seq#[58]])]) (resource=preexisting_corruption)",
                    "caused_by": {
                        "type": "i_o_exception",
                        "reason": "failed engine (reason: [corrupt file (source: [index id[GeoLite2-City.mmdb_39_1648340246151] origin[PRIMARY] seq#[58]])])",
                        "caused_by": {
                            "type": "corrupt_index_exception",
                            "reason": "checksum passed (c0b70d67). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path=\"/usr/share/elasticsearch/data/indices/VHZCPdAnT7Wc-giJ54Hmow/0/index/_d.fdm\")))"
                        }
                    }
                }
            }
        },
        {
            "node_id": "oIZgwJG9QnyEbmdEaMQoug",
            "node_name": "es-cluster-0",
            "transport_address": "10.44.0.5:9300",
            "node_attributes": {
                "ml.machine_memory": "8282959872",
                "ml.max_jvm_size": "536870912",
                "xpack.installed": "true"
            },
            "node_decision": "no",
            "store": {
                "in_sync": false,
                "allocation_id": "0HmHgxBJT7-3yitI7mOS7g"
            }
        },
        {
            "node_id": "tyWOETZHTrOXKM9tveNQVQ",
            "node_name": "es-cluster-2",
            "transport_address": "10.34.0.18:9300",
            "node_attributes": {
                "ml.machine_memory": "8282968064",
                "ml.max_jvm_size": "536870912",
                "xpack.installed": "true"
            },
            "node_decision": "no",
            "store": {
                "found": false
            }
        }
    ]
}

I have right now no idea what the error means. But as I'm learning, maybe someone can point me in the right direction.

In the meantime, there is also another index with the same problem. That is kind of strange. I had the exact same configuration running before (without any problems) but with elastic version 7. I might switch back this test cluster to the previous version in order to verify.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.