Im learning and playing around with Elasticsearch. Just setup a new local test cluster with 3 nodes on kubernetes (Elasticsearch 8.1.0) and also set up kibana 8.1.0. In kubernetes its a StateFulset with storage on NFS. The cluster health says "red" because there seems to be an unassigned shard:
.geoip_databases p UNASSIGNED ALLOCATION_FAILED
taking a closer look with: _cluster/allocation/explain?pretty
{
"note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
"index": ".geoip_databases",
"shard": 0,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2022-03-27T00:17:31.459Z",
"failed_allocation_attempts": 1,
"details": "failed shard on node [EUnT_WGpR_iARG1zED5l8w]: shard failure, reason [corrupt file (source: [index id[GeoLite2-City.mmdb_39_1648340246151] origin[PRIMARY] seq#[58]])], failure org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed\n\tat org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:907)\n\tat org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:920)\n\tat org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1527)\n\tat org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1815)\n\tat org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1468)\n\tat org.elasticsearch.index.engine.InternalEngine.addDocs(InternalEngine.java:1256)\n\tat org.elasticsearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1192)\n\tat org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:995)\n\tat org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:1040)\n\tat org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:973)\n\tat org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:891)\n\tat org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:320)\n\tat org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:185)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:250)\n\tat org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:131)\n\tat org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:70)\n\tat org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:210)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:776)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.lang.Thread.run(Thread.java:833)\nCaused by: java.io.IOException: read past EOF: NIOFSIndexInput(path=\"/usr/share/elasticsearch/data/indices/VHZCPdAnT7Wc-giJ54Hmow/0/index/_d.fdt\") buffer: java.nio.HeapByteBuffer[pos=0 lim=16 cap=1024] chunkLen: 16 end: 24854279: NIOFSIndexInput(path=\"/usr/share/elasticsearch/data/indices/VHZCPdAnT7Wc-giJ54Hmow/0/index/_d.fdt\")\n\tat org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:200)\n\tat org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:291)\n\tat org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:55)\n\tat org.apache.lucene.codecs.CodecUtil.readBEInt(CodecUtil.java:667)\n\tat org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:582)\n\tat org.apache.lucene.codecs.CodecUtil.retrieveChecksum(CodecUtil.java:534)\n\tat org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.<init>(Lucene90CompressingStoredFieldsReader.java:159)\n\tat org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsReader(Lucene90CompressingStoredFieldsFormat.java:133)\n\tat org.apache.lucene.codecs.lucene90.Lucene90StoredFieldsFormat.fieldsReader(Lucene90StoredFieldsFormat.java:136)\n\tat org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:134)\n\tat org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:89)\n\tat org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:179)\n\tat org.apache.lucene.index.BufferedUpdatesStream$SegmentState.<init>(BufferedUpdatesStream.java:295)\n\tat org.apache.lucene.index.IndexWriter.openSegmentStates(IndexWriter.java:6151)\n\tat org.apache.lucene.index.IndexWriter.forceApply(IndexWriter.java:5925)\n\tat org.apache.lucene.index.IndexWriter.tryApply(IndexWriter.java:5859)\n\tat org.apache.lucene.index.IndexWriter.lambda$publishFrozenUpdates$10(IndexWriter.java:2762)\n\tat org.apache.lucene.index.IndexWriter$EventQueue.processEventsInternal(IndexWriter.java:323)\n\tat org.apache.lucene.index.IndexWriter$EventQueue.processEvents(IndexWriter.java:312)\n\tat org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5702)\n\tat org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:575)\n\tat org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:380)\n\tat org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:354)\n\tat org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:344)\n\tat org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112)\n\tat org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170)\n\tat org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:48)\n\tat org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:27)\n\tat org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167)\n\tat org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240)\n\tat org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:354)\n\tat org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:334)\n\tat org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167)\n\tat org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213)\n\tat org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1817)\n\tat org.elasticsearch.index.engine.InternalEngine.maybeRefresh(InternalEngine.java:1796)\n\tat org.elasticsearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:3777)\n\tat org.elasticsearch.index.IndexService.maybeRefreshEngine(IndexService.java:911)\n\tat org.elasticsearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1037)\n\tat org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:133)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:717)\n\t... 3 more\n\tSuppressed: org.apache.lucene.index.CorruptIndexException: checksum passed (c0b70d67). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path=\"/usr/share/elasticsearch/data/indices/VHZCPdAnT7Wc-giJ54Hmow/0/index/_d.fdm\")))\n\t\tat org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:500)\n\t\tat org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.<init>(Lucene90CompressingStoredFieldsReader.java:208)\n\t\t... 37 more\nCaused by: java.io.EOFException: read past EOF: NIOFSIndexInput(path=\"/usr/share/elasticsearch/data/indices/VHZCPdAnT7Wc-giJ54Hmow/0/index/_d.fdt\") buffer: java.nio.HeapByteBuffer[pos=0 lim=16 cap=1024] chunkLen: 16 end: 24854279\n\tat org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:182)\n\t... 43 more\n",
"last_allocation_status": "no_valid_shard_copy"
},
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
"node_allocation_decisions": [
{
"node_id": "EUnT_WGpR_iARG1zED5l8w",
"node_name": "es-cluster-1",
"transport_address": "10.32.0.4:9300",
"node_attributes": {
"ml.machine_memory": "8282968064",
"xpack.installed": "true",
"ml.max_jvm_size": "536870912"
},
"node_decision": "no",
"store": {
"in_sync": true,
"allocation_id": "rKTu-HiAQHGbJETtDOQHkw",
"store_exception": {
"type": "corrupt_index_exception",
"reason": "failed engine (reason: [corrupt file (source: [index id[GeoLite2-City.mmdb_39_1648340246151] origin[PRIMARY] seq#[58]])]) (resource=preexisting_corruption)",
"caused_by": {
"type": "i_o_exception",
"reason": "failed engine (reason: [corrupt file (source: [index id[GeoLite2-City.mmdb_39_1648340246151] origin[PRIMARY] seq#[58]])])",
"caused_by": {
"type": "corrupt_index_exception",
"reason": "checksum passed (c0b70d67). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path=\"/usr/share/elasticsearch/data/indices/VHZCPdAnT7Wc-giJ54Hmow/0/index/_d.fdm\")))"
}
}
}
}
},
{
"node_id": "oIZgwJG9QnyEbmdEaMQoug",
"node_name": "es-cluster-0",
"transport_address": "10.44.0.5:9300",
"node_attributes": {
"ml.machine_memory": "8282959872",
"ml.max_jvm_size": "536870912",
"xpack.installed": "true"
},
"node_decision": "no",
"store": {
"in_sync": false,
"allocation_id": "0HmHgxBJT7-3yitI7mOS7g"
}
},
{
"node_id": "tyWOETZHTrOXKM9tveNQVQ",
"node_name": "es-cluster-2",
"transport_address": "10.34.0.18:9300",
"node_attributes": {
"ml.machine_memory": "8282968064",
"ml.max_jvm_size": "536870912",
"xpack.installed": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}
]
}
I have right now no idea what the error means. But as I'm learning, maybe someone can point me in the right direction.