ElasticSearch keeps restarting in kubernetes

Elasticsearch keeps restarting, despite being in a ready state. I've installed it using the official Elasticsearch chart, with default settings. I raised this question on the chart's github project and have been redirected here.

Chart version:

$ helm search repo elasticsearch

elastic/elasticsearch           7.17.3

Kubernetes version:

$ k version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"ba58f86b00f6b0f0b7694a75464aa7806f8bf6fc", GitTreeState:"clean", BuildDate:"2022-03-30T23:40:46Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes provider:
AKS

Helm Version:

$ helm version
version.BuildInfo{Version:"v3.7.1", GitCommit:"1d11fcb5d3f3bf00dbe6fe31b8412839a96b3dc4", GitTreeState:"clean", GoVersion:"go1.16.9"}

Describe the bug:
If I install using the details provided above (without any values file) then Elasticsearch fails after a short period of being ready.

Steps to reproduce:

$ helm install elasticsearch elastic/elasticsearch

Provide logs and/or server output (if relevant):

$ k get pods -l app=elasticsearch-master -w
NAME                     READY   STATUS     RESTARTS   AGE
elasticsearch-master-0   0/1     Init:0/1   0          27s
elasticsearch-master-1   0/1     Running    0          27s
elasticsearch-master-2   0/1     Init:0/1   0          27s
elasticsearch-master-2   0/1     Init:0/1   0          48s
elasticsearch-master-2   0/1     PodInitializing   0          51s
elasticsearch-master-2   0/1     Running           0          52s
elasticsearch-master-0   0/1     PodInitializing   0          66s
elasticsearch-master-0   0/1     Running           0          67s
elasticsearch-master-0   1/1     Running           0          2m
elasticsearch-master-1   1/1     Running           0          2m
elasticsearch-master-2   1/1     Running           0          2m3s
elasticsearch-master-1   1/1     Terminating       0          2m24s
elasticsearch-master-1   0/1     Terminating       0          2m25s
elasticsearch-master-1   0/1     Terminating       0          2m25s
elasticsearch-master-1   0/1     Terminating       0          2m25s
elasticsearch-master-1   0/1     Pending           0          0s
elasticsearch-master-1   0/1     Pending           0          0s
elasticsearch-master-1   0/1     Init:0/1          0          0s
elasticsearch-master-1   0/1     PodInitializing   0          5s
elasticsearch-master-1   0/1     Running           0          6s
elasticsearch-master-1   1/1     Running           0          60s

The logs are rather long, so here's just the ending bit:

{"type": "server", "timestamp": "2022-05-18T08:34:01,977Z", "level": "INFO", "component": "o.e.c.r.DelayedAllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "scheduling reroute for delayed shards in [59.9s] (1 delayed shards)", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:02,154Z", "level": "INFO", "component": "o.e.i.g.GeoIpDownloader", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "updating geoip databases", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:02,155Z", "level": "INFO", "component": "o.e.i.g.GeoIpDownloader", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "fetching geoip databases overview from [https://geoip.elastic.co/v1/database?elastic_geoip_service_tos=agree]", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:02,812Z", "level": "INFO", "component": "o.e.i.g.GeoIpDownloader", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "geoip database [GeoLite2-ASN.mmdb] is up to date, updated timestamp", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:02,882Z", "level": "INFO", "component": "o.e.i.g.GeoIpDownloader", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "geoip database [GeoLite2-City.mmdb] is up to date, updated timestamp", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:02,954Z", "level": "INFO", "component": "o.e.i.g.GeoIpDownloader", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "geoip database [GeoLite2-Country.mmdb] is up to date, updated timestamp", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:41,440Z", "level": "INFO", "component": "o.e.c.s.MasterService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "node-join[{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw} join existing leader], term: 5, version: 80, delta: added {{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw}}", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:42,462Z", "level": "INFO", "component": "o.e.c.s.ClusterApplierService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "added {{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw}}, term: 5, version: 80, reason: Publication{term=5, version=80}", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:43,552Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.geoip_databases][0]]]).", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:36:01,684Z", "level": "INFO", "component": "o.e.t.ClusterConnectionManager", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "transport connection to [{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw}] closed by remote", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:36:01,686Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "Cluster health status changed from [GREEN] to [YELLOW] (reason: [{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw} reason: disconnected]).", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:36:01,687Z", "level": "INFO", "component": "o.e.c.s.MasterService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "node-left[{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw} reason: disconnected], term: 5, version: 83, delta: removed {{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw}}", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:36:01,703Z", "level": "INFO", "component": "o.e.c.s.ClusterApplierService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "removed {{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw}}, term: 5, version: 83, reason: Publication{term=5, version=83}", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:36:01,713Z", "level": "INFO", "component": "o.e.c.r.DelayedAllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "scheduling reroute for delayed shards in [59.9s] (1 delayed shards)", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }

Any additional context:
There is a stacktrace or two in the logs. I think they're harmless, but here's the first few lines of one:

{"type": "server", "timestamp": "2022-05-18T08:36:41,818Z", "level": "INFO", "component": "o.e.i.g.DatabaseNodeService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-0", "message": "retrieve geoip database [GeoLite2-ASN.mmdb] from [.geoip_databases] to [/tmp/elasticsearch-10444144470664545833/geoip-databases/6yeN9JukTkKW2q5-NoBAYQ/GeoLite2-ASN.mmdb.tmp.gz]" }
{"type": "server", "timestamp": "2022-05-18T08:36:41,815Z", "level": "ERROR", "component": "o.e.i.g.DatabaseNodeService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-0", "message": "failed to retrieve database [GeoLite2-City.mmdb]", 
"stacktrace": ["org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];",
"at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:179) ~[elasticsearch-7.17.3.jar:7.17.3]",
"at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:165) ~[elasticsearch-7.17.3.jar:7.17.3]",
"at org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:929) ~[elasticsearch-7.17.3.jar:7.17.3]",
"at org.elasticsearch.action.search.TransportSearchAction.executeLocalSearch(TransportSearchAction.java:763) ~[elasticsearch-7.17.3.jar:7.17.3]",
"at org.elasticsearch.action.search.TransportSearchAction.lambda$executeRequest$6(TransportSearchAction.java:399) ~[elasticsearch-7.17.3.jar:7.17.3]",
"at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:136) ~[elasticsearch-7.17.3.jar:7.17.3]",

The others complain about different GeoLite2 databases.

I think we'd need to see more of your logs, as that last section contains an ERROR that is not harmless.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.