I'm setting up a cluster on Azure using ECK. The specifications are following:
ECK version: 2.2, 2.3 (tried both)
Elastic version: 8.0, 8.1.0, 8.2.0, 8.3.1 (tried all versions)
K8s version: 1.23.5
Node size: Standard_D2ds_v5
Storage type: azurefile-csi-premium
And here is my YAML which I use to deploy the cluster:
# Sample cluster with 2 data-master nodes, 1 voting node, and 1 kibana
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: logging
namespace: logging
spec:
version: 8.3.1
http:
service:
metadata:
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
spec:
type: LoadBalancer
volumeClaimDeletePolicy: DeleteOnScaledownOnly
secureSettings:
- secretName: azure-blob
nodeSets:
- name: data-master
count: 2
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: azurefile-csi-premium
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
runAsUser: 0
command: ["sh", "-c", "sysctl -w vm.max_map_count=262144"]
containers:
- name: elasticsearch
resources:
requests:
memory: 5Gi
limits:
memory: 5Gi
config:
node.roles:
[
"data_content",
"data_hot",
"ingest",
"master",
"remote_cluster_client",
"transform"
]
- name: voting-only
count: 1
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: azurefile-csi
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
runAsUser: 0
command: ["sh", "-c", "sysctl -w vm.max_map_count=262144"]
containers:
- name: elasticsearch
resources:
requests:
memory: 1Gi
limits:
memory: 1Gi
config:
node.roles: ["master", "voting_only"]
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: logging
namespace: logging
spec:
version: 8.3.1
count: 1
elasticsearchRef:
name: logging
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
runAsUser: 0
command: ["sh", "-c", "sysctl -w vm.max_map_count=262144"]
containers:
- name: kibana
env:
- name: NODE_OPTIONS
value: "--max-old-space-size=2048"
resources:
requests:
memory: 2Gi
limits:
memory: 3Gi
I create a cluster with either 2 master nodes and 1 voting_only OR 3 master nodes, and setting up single user with permissions to one index. Everything is good, the cluster is green, and the logs are clean. It works for 4 days, and on the fourth day suddenly the index named ".kibana_task_manager_8.3.1_001" goes yellow and after a bit red. It is always the same index with the corresponding elastic version.
At this state Kibana is still working, if I restart the Kibana, then it will not opening in the browser saying Kibana server is not ready yet.
As it is a system index I can't and don't want to delete it. But I ran some commands and here are the results from logs.
GET _cluster/allocation/explain
{
"note" : "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
"index" : ".kibana_task_manager_8.2.0_001",
"shard" : 0,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2022-06-22T06:49:08.978Z",
"failed_allocation_attempts" : 1,
"details" : """failed shard on node [ywhs1A1sTF-SfxYizd29YA]: shard failure, reason [merge failed], failure org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: merge_exception: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/usr/share/elasticsearch/data/indices/HAtFfZ74TxW0PJm6KqWsiw/0/index/_ep7.nvm")))
at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2573)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:833)
Caused by: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/usr/share/elasticsearch/data/indices/HAtFfZ74TxW0PJm6KqWsiw/0/index/_ep7.nvm")))
at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:584)
at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:432)
at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:492)
at org.apache.lucene.codecs.lucene90.Lucene90NormsProducer.<init>(Lucene90NormsProducer.java:78)
at org.apache.lucene.codecs.lucene90.Lucene90NormsFormat.normsProducer(Lucene90NormsFormat.java:96)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:199)
at org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:293)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:136)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4964)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4500)
at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6252)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:638)
at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:118)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:699)
Suppressed: java.io.IOException: read past EOF: NIOFSIndexInput(path="/usr/share/elasticsearch/data/indices/HAtFfZ74TxW0PJm6KqWsiw/0/index/_ep7.nvm") buffer: java.nio.HeapByteBuffer[pos=0 lim=211 cap=1024] chunkLen: 211 end: 211: NIOFSIndexInput(path="/usr/share/elasticsearch/data/indices/HAtFfZ74TxW0PJm6KqWsiw/0/index/_ep7.nvm")
at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:200)
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:291)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:55)
at org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:39)
at org.apache.lucene.codecs.CodecUtil.readBEInt(CodecUtil.java:667)
at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:184)
at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:253)
at org.apache.lucene.codecs.lucene90.Lucene90NormsProducer.<init>(Lucene90NormsProducer.java:67)
... 10 more
Caused by: java.io.EOFException: read past EOF: NIOFSIndexInput(path="/usr/share/elasticsearch/data/indices/HAtFfZ74TxW0PJm6KqWsiw/0/index/_ep7.nvm") buffer: java.nio.HeapByteBuffer[pos=0 lim=211 cap=1024] chunkLen: 211 end: 211
at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:182)
... 17 more
""",
"last_allocation_status" : "no_valid_shard_copy"
},
"can_allocate" : "no_valid_shard_copy",
"allocate_explanation" : "Elasticsearch can't allocate this shard because all the copies of its data in the cluster are stale or corrupt. Elasticsearch will allocate this shard when a node containing a good copy of its data joins the cluster. If no such node is available, restore this index from a recent snapshot.",
"node_allocation_decisions" : [
{
"node_id" : "P9IGXuKeRVeu2SvkrITcOQ",
"node_name" : "logging-es-data-master-1",
"transport_address" : "172.23.1.53:9300",
"node_attributes" : {
"k8s_node_name" : "aks-data-40086978-vmss000000",
"xpack.installed" : "true"
},
"node_decision" : "no",
"store" : {
"in_sync" : false,
"allocation_id" : "fF8agBT7RPyXaQKEL3LSjA"
}
},
{
"node_id" : "ywhs1A1sTF-SfxYizd29YA",
"node_name" : "logging-es-data-master-0",
"transport_address" : "172.23.1.216:9300",
"node_attributes" : {
"k8s_node_name" : "aks-data-40086978-vmss000002",
"xpack.installed" : "true"
},
"node_decision" : "no",
"store" : {
"in_sync" : true,
"allocation_id" : "h8qDXSrtSGGwF5AFdVmqcg",
"store_exception" : {
"type" : "corrupt_index_exception",
"reason" : "failed engine (reason: [merge failed]) (resource=preexisting_corruption)",
"caused_by" : {
"type" : "i_o_exception",
"reason" : "failed engine (reason: [merge failed])",
"caused_by" : {
"type" : "corrupt_index_exception",
"reason" : """codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/usr/share/elasticsearch/data/indices/HAtFfZ74TxW0PJm6KqWsiw/0/index/_ep7.nvm")))""",
"suppressed" : [
{
"type" : "i_o_exception",
"reason" : """read past EOF: NIOFSIndexInput(path="/usr/share/elasticsearch/data/indices/HAtFfZ74TxW0PJm6KqWsiw/0/index/_ep7.nvm") buffer: java.nio.HeapByteBuffer[pos=0 lim=211 cap=1024] chunkLen: 211 end: 211: NIOFSIndexInput(path="/usr/share/elasticsearch/data/indices/HAtFfZ74TxW0PJm6KqWsiw/0/index/_ep7.nvm")""",
"caused_by" : {
"type" : "e_o_f_exception",
"reason" : """read past EOF: NIOFSIndexInput(path="/usr/share/elasticsearch/data/indices/HAtFfZ74TxW0PJm6KqWsiw/0/index/_ep7.nvm") buffer: java.nio.HeapByteBuffer[pos=0 lim=211 cap=1024] chunkLen: 211 end: 211"""
}
}
]
}
}
}
}
}
]
}
Error from logs: (later date and different cluster version)
{"@timestamp":"2022-07-01T16:02:46.221Z", "log.level": "WARN", "message":"path: /.kibana_task_manager/_update_by_query, params: {ignore_unavailable=true, refresh=true, index=.kibana_task_manager}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[logging-es-data-master-0][transport_worker][T#1]","log.logger":"rest.suppressed","trace.id":"b26af086e95da1e42b8240f8a9b29f22","elasticsearch.cluster.uuid":"1Q3hry5HRNC97Bv511PMKg","elasticsearch.node.id":"ywhs1A1sTF-SfxYizd29YA","elasticsearch.node.name":"logging-es-data-master-0","elasticsearch.cluster.name":"logging","error.type":"org.elasticsearch.action.search.SearchPhaseExecutionException","error.message":"","error.stack_trace":"Failed to execute phase [query],
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:730)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:476)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.start(AbstractSearchAsyncAction.java:216)
at org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:1031)
at org.elasticsearch.action.search.TransportSearchAction.executeLocalSearch(TransportSearchAction.java:747)
at org.elasticsearch.action.search.TransportSearchAction.lambda$executeRequest$6(TransportSearchAction.java:390)
at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:162)
at org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:112)
at org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:77)
at org.elasticsearch.action.search.TransportSearchAction.executeRequest(TransportSearchAction.java:478)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:277)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:103)
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:79)
at org.elasticsearch.action.support.ActionFilter$Simple.apply(ActionFilter.java:53)
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:77)
at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$3(SecurityActionFilter.java:161)
at org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:245)
at org.elasticsearch.xpack.security.authz.AuthorizationService$1.onResponse(AuthorizationService.java:573)
at org.elasticsearch.xpack.security.authz.AuthorizationService$1.onResponse(AuthorizationService.java:567)
at org.elasticsearch.xpack.security.authz.interceptor.BulkShardRequestInterceptor.intercept(BulkShardRequestInterceptor.java:86)
at org.elasticsearch.xpack.security.authz.AuthorizationService$1.onResponse(AuthorizationService.java:571)
at org.elasticsearch.xpack.security.authz.AuthorizationService$1.onResponse(AuthorizationService.java:567)
...
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: Failed to execute phase [query], Search rejected due to missing shards [[.kibana_task_manager_8.2.0_001][0]]. Consider using `allow_partial_search_results` setting to bypass this error.
at org.elasticsearch.action.search.AbstractSearchAsyncAction.run(AbstractSearchAsyncAction.java:244)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:471)
... 278 more
"}
Another error that I cloud get was the logs when the cluster went from green to yellow and then red state:
{
"timestamp": "2022-07-03T12:36:02+00:00",
"message": "readiness probe failed",
"curl_rc": "7"
}
{
"@timestamp": "2022-07-03T12:36:03.108Z",
"log.level": "INFO",
"message": "Security is enabled",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "main",
"log.logger": "org.elasticsearch.xpack.security.Security",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"@timestamp": "2022-07-03T12:36:03.600Z",
"log.level": "INFO",
"message": "parsed [49] roles from file [/usr/share/elasticsearch/config/roles.yml]",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "main",
"log.logger": "org.elasticsearch.xpack.security.authz.store.FileRolesStore",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"@timestamp": "2022-07-03T12:36:04.399Z",
"log.level": "INFO",
"message": "[controller/89] [Main.cc@123] controller (64 bit): Version 8.3.1 (Build 249951386bdb3a) Copyright (c) 2022 Elasticsearch BV",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "ml-cpp-log-tail-thread",
"log.logger": "org.elasticsearch.xpack.ml.process.logging.CppLogMessageHandler",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"@timestamp": "2022-07-03T12:36:05.169Z",
"log.level": "INFO",
"message": "creating NettyAllocator with the following configs: [name=elasticsearch_configured, chunk_size=1mb, suggested_max_allocation_size=1mb, factors={es.unsafe.use_netty_default_chunk_and_page_size=false, g1gc_enabled=true, g1gc_region_size=4mb}]",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "main",
"log.logger": "org.elasticsearch.transport.netty4.NettyAllocator",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"@timestamp": "2022-07-03T12:36:05.200Z",
"log.level": "INFO",
"message": "using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "main",
"log.logger": "org.elasticsearch.indices.recovery.RecoverySettings",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"@timestamp": "2022-07-03T12:36:05.232Z",
"log.level": "INFO",
"message": "using discovery type [multi-node] and seed hosts providers [settings, file]",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "main",
"log.logger": "org.elasticsearch.discovery.DiscoveryModule",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"@timestamp": "2022-07-03T12:36:07.334Z",
"log.level": "INFO",
"message": "initialized",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "main",
"log.logger": "org.elasticsearch.node.Node",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"@timestamp": "2022-07-03T12:36:07.334Z",
"log.level": "INFO",
"message": "starting ...",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "main",
"log.logger": "org.elasticsearch.node.Node",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"timestamp": "2022-07-03T12:36:07+00:00",
"message": "readiness probe failed",
"curl_rc": "7"
}
{
"@timestamp": "2022-07-03T12:36:07.578Z",
"log.level": "INFO",
"message": "persistent cache index loaded",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "main",
"log.logger": "org.elasticsearch.xpack.searchablesnapshots.cache.full.PersistentCache",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"@timestamp": "2022-07-03T12:36:07.578Z",
"log.level": "INFO",
"message": "deprecation component started",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "main",
"log.logger": "org.elasticsearch.xpack.deprecation.logging.DeprecationIndexingComponent",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"@timestamp": "2022-07-03T12:36:07.672Z",
"log.level": "INFO",
"message": "publish_address {172.23.1.77:9300}, bound_addresses {[::]:9300}",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "main",
"log.logger": "org.elasticsearch.transport.TransportService",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"@timestamp": "2022-07-03T12:36:11.166Z",
"log.level": "INFO",
"message": "bound or publishing to a non-loopback address, enforcing bootstrap checks",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "main",
"log.logger": "org.elasticsearch.bootstrap.BootstrapChecks",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"@timestamp": "2022-07-03T12:36:11.194Z",
"log.level": "INFO",
"message": "this node is locked into cluster UUID [8jatKO_KTcaxxrBJgdQSBQ] and will not attempt further cluster bootstrapping",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "main",
"log.logger": "org.elasticsearch.cluster.coordination.ClusterBootstrapService",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"timestamp": "2022-07-03T12:36:12+00:00",
"message": "readiness probe failed",
"curl_rc": "7"
}
{
"@timestamp": "2022-07-03T12:36:12.988Z",
"log.level": "INFO",
"message": "master node changed {previous [], current [{logging-es-data-master-1}{IaRruitfTU6tHqFl35MwOA}{sT7dsmV0S5GncW2vPqL2qA}{logging-es-data-master-1}{172.23.1.176}{172.23.1.176:9300}{dhimst}]}, added {{logging-es-data-master-1}{IaRruitfTU6tHqFl35MwOA}{sT7dsmV0S5GncW2vPqL2qA}{logging-es-data-master-1}{172.23.1.176}{172.23.1.176:9300}{dhimst}, {logging-es-voting-only-0}{Qg3flffkSVeK7zETPHHn5w}{NZD1mufxR_S9o8ZNtnVdtA}{logging-es-voting-only-0}{172.23.10.19}{172.23.10.19:9300}{mv}}, term: 30, version: 668, reason: ApplyCommitRequest{term=30, version=668, sourceNode={logging-es-data-master-1}{IaRruitfTU6tHqFl35MwOA}{sT7dsmV0S5GncW2vPqL2qA}{logging-es-data-master-1}{172.23.1.176}{172.23.1.176:9300}{dhimst}{k8s_node_name=aks-data-40086978-vmss000002, xpack.installed=true}}",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "elasticsearch[logging-es-data-master-0][clusterApplierService#updateTask][T#1]",
"log.logger": "org.elasticsearch.cluster.service.ClusterApplierService",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging"
}
{
"@timestamp": "2022-07-03T12:36:13.150Z",
"log.level": "ERROR",
"message": "failed to retrieve database [GeoLite2-ASN.mmdb]",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "elasticsearch[logging-es-data-master-0][generic][T#4]",
"log.logger": "org.elasticsearch.ingest.geoip.DatabaseNodeService",
"elasticsearch.node.name": "logging-es-data-master-0",
"elasticsearch.cluster.name": "logging",
"error.type": "org.elasticsearch.cluster.block.ClusterBlockException",
"error.message": "blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];",
"error.stack_trace": "org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];\n\tat org.elasticsearch.server@8.3.1/org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:177)\n\tat org.elasticsearch.server@8.3.1/org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:163)\n\tat org.elasticsearch.server@8.3.1/org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:913)\n\tat org.elasticsearch.server@8.3.1/org.elasticsearch.action.search.TransportSearchAction.executeLocalSearch(TransportSearchAction.java:747)\n\tat org.elasticsearch.server@8.3.1/org.elasticsearch.action.search.TransportSearchAction.lambda$executeRequest$6(TransportSearchAction.java:390)
...
\n\tat org.elasticsearch.server@8.3.1/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:710)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\n"
}
I don't understand the issue, couldn’t find a workaround yet. Updating, and setting up a new cluster always cause the same issue after a few days.