ElasticSearch keeps restarting in kubernetes

EngineerDollery · May 19, 2022, 8:59am

Elasticsearch keeps restarting, despite being in a ready state. I've installed it using the official Elasticsearch chart, with default settings. I raised this question on the chart's github project and have been redirected here.

Chart version:

$ helm search repo elasticsearch

elastic/elasticsearch           7.17.3

Kubernetes version:

$ k version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"ba58f86b00f6b0f0b7694a75464aa7806f8bf6fc", GitTreeState:"clean", BuildDate:"2022-03-30T23:40:46Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes provider:
AKS

Helm Version:

$ helm version
version.BuildInfo{Version:"v3.7.1", GitCommit:"1d11fcb5d3f3bf00dbe6fe31b8412839a96b3dc4", GitTreeState:"clean", GoVersion:"go1.16.9"}

Describe the bug:
If I install using the details provided above (without any values file) then Elasticsearch fails after a short period of being ready.

Steps to reproduce:

$ helm install elasticsearch elastic/elasticsearch

Provide logs and/or server output (if relevant):

$ k get pods -l app=elasticsearch-master -w
NAME                     READY   STATUS     RESTARTS   AGE
elasticsearch-master-0   0/1     Init:0/1   0          27s
elasticsearch-master-1   0/1     Running    0          27s
elasticsearch-master-2   0/1     Init:0/1   0          27s
elasticsearch-master-2   0/1     Init:0/1   0          48s
elasticsearch-master-2   0/1     PodInitializing   0          51s
elasticsearch-master-2   0/1     Running           0          52s
elasticsearch-master-0   0/1     PodInitializing   0          66s
elasticsearch-master-0   0/1     Running           0          67s
elasticsearch-master-0   1/1     Running           0          2m
elasticsearch-master-1   1/1     Running           0          2m
elasticsearch-master-2   1/1     Running           0          2m3s
elasticsearch-master-1   1/1     Terminating       0          2m24s
elasticsearch-master-1   0/1     Terminating       0          2m25s
elasticsearch-master-1   0/1     Terminating       0          2m25s
elasticsearch-master-1   0/1     Terminating       0          2m25s
elasticsearch-master-1   0/1     Pending           0          0s
elasticsearch-master-1   0/1     Pending           0          0s
elasticsearch-master-1   0/1     Init:0/1          0          0s
elasticsearch-master-1   0/1     PodInitializing   0          5s
elasticsearch-master-1   0/1     Running           0          6s
elasticsearch-master-1   1/1     Running           0          60s

The logs are rather long, so here's just the ending bit:

{"type": "server", "timestamp": "2022-05-18T08:34:01,977Z", "level": "INFO", "component": "o.e.c.r.DelayedAllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "scheduling reroute for delayed shards in [59.9s] (1 delayed shards)", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:02,154Z", "level": "INFO", "component": "o.e.i.g.GeoIpDownloader", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "updating geoip databases", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:02,155Z", "level": "INFO", "component": "o.e.i.g.GeoIpDownloader", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "fetching geoip databases overview from [https://geoip.elastic.co/v1/database?elastic_geoip_service_tos=agree]", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:02,812Z", "level": "INFO", "component": "o.e.i.g.GeoIpDownloader", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "geoip database [GeoLite2-ASN.mmdb] is up to date, updated timestamp", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:02,882Z", "level": "INFO", "component": "o.e.i.g.GeoIpDownloader", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "geoip database [GeoLite2-City.mmdb] is up to date, updated timestamp", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:02,954Z", "level": "INFO", "component": "o.e.i.g.GeoIpDownloader", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "geoip database [GeoLite2-Country.mmdb] is up to date, updated timestamp", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:41,440Z", "level": "INFO", "component": "o.e.c.s.MasterService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "node-join[{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw} join existing leader], term: 5, version: 80, delta: added {{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw}}", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:42,462Z", "level": "INFO", "component": "o.e.c.s.ClusterApplierService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "added {{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw}}, term: 5, version: 80, reason: Publication{term=5, version=80}", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:34:43,552Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.geoip_databases][0]]]).", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:36:01,684Z", "level": "INFO", "component": "o.e.t.ClusterConnectionManager", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "transport connection to [{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw}] closed by remote", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:36:01,686Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "Cluster health status changed from [GREEN] to [YELLOW] (reason: [{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw} reason: disconnected]).", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:36:01,687Z", "level": "INFO", "component": "o.e.c.s.MasterService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "node-left[{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw} reason: disconnected], term: 5, version: 83, delta: removed {{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw}}", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:36:01,703Z", "level": "INFO", "component": "o.e.c.s.ClusterApplierService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "removed {{elasticsearch-master-0}{6yeN9JukTkKW2q5-NoBAYQ}{GDkulEhxQCW9Ek_qgGPijQ}{10.244.13.205}{10.244.13.205:9300}{cdfhilmrstw}}, term: 5, version: 83, reason: Publication{term=5, version=83}", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }
{"type": "server", "timestamp": "2022-05-18T08:36:01,713Z", "level": "INFO", "component": "o.e.c.r.DelayedAllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-2", "message": "scheduling reroute for delayed shards in [59.9s] (1 delayed shards)", "cluster.uuid": "5rm76qDXRfmfn4ZzfOuHfQ", "node.id": "jJY1CYS7ReiQdY0fSgRUiQ"  }

Any additional context:
There is a stacktrace or two in the logs. I think they're harmless, but here's the first few lines of one:

{"type": "server", "timestamp": "2022-05-18T08:36:41,818Z", "level": "INFO", "component": "o.e.i.g.DatabaseNodeService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-0", "message": "retrieve geoip database [GeoLite2-ASN.mmdb] from [.geoip_databases] to [/tmp/elasticsearch-10444144470664545833/geoip-databases/6yeN9JukTkKW2q5-NoBAYQ/GeoLite2-ASN.mmdb.tmp.gz]" }
{"type": "server", "timestamp": "2022-05-18T08:36:41,815Z", "level": "ERROR", "component": "o.e.i.g.DatabaseNodeService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-0", "message": "failed to retrieve database [GeoLite2-City.mmdb]", 
"stacktrace": ["org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];",
"at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:179) ~[elasticsearch-7.17.3.jar:7.17.3]",
"at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:165) ~[elasticsearch-7.17.3.jar:7.17.3]",
"at org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:929) ~[elasticsearch-7.17.3.jar:7.17.3]",
"at org.elasticsearch.action.search.TransportSearchAction.executeLocalSearch(TransportSearchAction.java:763) ~[elasticsearch-7.17.3.jar:7.17.3]",
"at org.elasticsearch.action.search.TransportSearchAction.lambda$executeRequest$6(TransportSearchAction.java:399) ~[elasticsearch-7.17.3.jar:7.17.3]",
"at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:136) ~[elasticsearch-7.17.3.jar:7.17.3]",

The others complain about different GeoLite2 databases.

warkolm · May 19, 2022, 11:10pm

I think we'd need to see more of your logs, as that last section contains an ERROR that is not harmless.

system · June 16, 2022, 11:11pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch nodes keep restarting on Kubernetes Elasticsearch	1	1649	April 25, 2018
Problems Installing Elasticsearch via Helm Kubernetes (version 7.6.1) Elasticsearch	3	1659	April 9, 2020
How to install Elasticsearch in Kubernetes by Helm? Elasticsearch docker , painless , runtime-fields	1	364	June 6, 2024
Default Elasticsearch ECK Installation stuck on "readiness probe failed" Elastic Cloud on Kubernetes (ECK) docker , painless	2	1945	March 15, 2023
Fails to deploy ES Helm Chart on Kubernetes Elasticsearch	2	796	January 15, 2020

ElasticSearch keeps restarting in kubernetes

Related topics