Hello,
We have a cluster with
- 1 master ( 1 core, 4g memory )
- 2 worknodes ( 2 core, 8g memory )
When starting my data & master nodes resut in a CrashLoopBackOff.
I've been trying to debug the issue, but got stuck.
If someone might point me in the right direction I'de be much obliged
The master & data node yml:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elastic-cluster
spec:
version: 7.16.2
nodeSets:
- name: master-nodes
count: 3
config:
node.roles: ["master"]
xpack.ml.enabled: true
podTemplate:
spec:
securityContext:
runAsUser: 1099
fsGroup: 1099
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
env:
- name: PRE_STOP_ADDITIONAL_WAIT_SECONDS
value: "5"
- name: ES_JAVA_OPTS
value: "-Xms3g -Xmx3g"
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: local-hostpath
- name: data-nodes
count: 2
config:
node.roles: ["data"]
podTemplate:
spec:
securityContext:
runAsUser: 1099
fsGroup: 1099
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 25Gi
storageClassName: local-hostpath
The storage class
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-hostpath
annotations:
openebs.io/cas-type: local
cas.openebs.io/config: |
- name: StorageType
value: hostpath
- name: BasePath
value: /mnt/data
provisioner: openebs.io/local
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
After applying the storage class and the elastic, the appropriate volumes, PV's, PVC's and pods are created.
Some logs:
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-0f2552a9-d434-4bfb-a7f1-b3bf734a1e35 1Gi RWO Delete Bound default/elasticsearch-data-elastic-cluster-es-master-nodes-2 local-hostpath 34m
persistentvolume/pvc-42c12e65-33f5-4168-96eb-77d53644eff2 1Gi RWO Delete Bound default/elasticsearch-data-elastic-cluster-es-master-nodes-1 local-hostpath 34m
persistentvolume/pvc-bf38cd11-e8c5-4c91-bf4c-2e688f1521d0 25Gi RWO Delete Bound default/elasticsearch-data-elastic-cluster-es-data-nodes-1 local-hostpath 34m
persistentvolume/pvc-cc6d1b0c-d707-4a50-80cc-131e7ea29cd5 25Gi RWO Delete Bound default/elasticsearch-data-elastic-cluster-es-data-nodes-0 local-hostpath 34m
persistentvolume/pvc-f886ea88-8d73-499b-8ae9-673eb3154b08 1Gi RWO Delete Bound default/elasticsearch-data-elastic-cluster-es-master-nodes-0 local-hostpath 34m
xxx: ls -la /mnt/data
drwxrwsrwx 2 root zzz 4096 Jan 13 13:15 pvc-0f2552a9-d434-4bfb-a7f1-b3bf734a1e35
drwxrwsrwx 2 root zzz 4096 Jan 13 13:15 pvc-42c12e65-33f5-4168-96eb-77d53644eff2
yyy: ls -la /mnt/data
drwxrwsrwx 3 root zzz 4096 Jan 13 13:16 pvc-bf38cd11-e8c5-4c91-bf4c-2e688f1521d0
drwxrwsrwx 3 root zzz 4096 Jan 13 13:15 pvc-cc6d1b0c-d707-4a50-80cc-131e7ea29cd5
drwxrwsrwx 2 root zzz 4096 Jan 13 13:15 pvc-f886ea88-8d73-499b-8ae9-673eb3154b08
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-data-nodes-0 Bound pvc-cc6d1b0c-d707-4a50-80cc-131e7ea29cd5 25Gi RWO local-hostpath 34m
persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-data-nodes-1 Bound pvc-bf38cd11-e8c5-4c91-bf4c-2e688f1521d0 25Gi RWO local-hostpath 34m
persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-master-nodes-0 Bound pvc-f886ea88-8d73-499b-8ae9-673eb3154b08 1Gi RWO local-hostpath 34m
persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-master-nodes-1 Bound pvc-42c12e65-33f5-4168-96eb-77d53644eff2 1Gi RWO local-hostpath 34m
persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-master-nodes-2 Bound pvc-0f2552a9-d434-4bfb-a7f1-b3bf734a1e35 1Gi RWO local-hostpath 34m
kubectl describe persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-data-nodes-0
Name: elasticsearch-data-elastic-cluster-es-data-nodes-0
Namespace: default
StorageClass: local-hostpath
Status: Bound
Volume: pvc-cc6d1b0c-d707-4a50-80cc-131e7ea29cd5
Labels: common.k8s.elastic.co/type=elasticsearch
elasticsearch.k8s.elastic.co/cluster-name=elastic-cluster
elasticsearch.k8s.elastic.co/statefulset-name=elastic-cluster-es-data-nodes
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: openebs.io/local
volume.kubernetes.io/selected-node: yyy
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 25Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: elastic-cluster-es-data-nodes-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 43m openebs.io/local_openebs-localpv-provisioner-6756f57d65-fbvbf_002b1b0d-c0a5-4b93-9e48-4c8bf3884ab3 External provisioner is provisioning volume for claim "default/elasticsearch-data-elastic-cluster-es-data-nodes-0"
Normal ProvisioningSucceeded 43m openebs.io/local_openebs-localpv-provisioner-6756f57d65-fbvbf_002b1b0d-c0a5-4b93-9e48-4c8bf3884ab3 Successfully provisioned volume pvc-cc6d1b0c-d707-4a50-80cc-131e7ea29cd5
Normal WaitForFirstConsumer 43m persistentvolume-controller waiting for first consumer to be created before binding
Normal ExternalProvisioning 43m (x4 over 43m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openebs.io/local" or manually created by system administrator
I do find this last line suspicious
kubectl describe persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-master-nodes-0
Name: elasticsearch-data-elastic-cluster-es-master-nodes-0
Namespace: default
StorageClass: local-hostpath
Status: Bound
Volume: pvc-f886ea88-8d73-499b-8ae9-673eb3154b08
Labels: common.k8s.elastic.co/type=elasticsearch
elasticsearch.k8s.elastic.co/cluster-name=elastic-cluster
elasticsearch.k8s.elastic.co/statefulset-name=elastic-cluster-es-master-nodes
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: openebs.io/local
volume.kubernetes.io/selected-node: yyy
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 1Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: elastic-cluster-es-master-nodes-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 45m openebs.io/local_openebs-localpv-provisioner-6756f57d65-fbvbf_002b1b0d-c0a5-4b93-9e48-4c8bf3884ab3 External provisioner is provisioning volume for claim "default/elasticsearch-data-elastic-cluster-es-master-nodes-0"
Normal ProvisioningSucceeded 45m openebs.io/local_openebs-localpv-provisioner-6756f57d65-fbvbf_002b1b0d-c0a5-4b93-9e48-4c8bf3884ab3 Successfully provisioned volume pvc-f886ea88-8d73-499b-8ae9-673eb3154b08
Normal WaitForFirstConsumer 44m persistentvolume-controller waiting for first consumer to be created before binding
Normal ExternalProvisioning 44m (x3 over 44m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openebs.io/local" or manually created by system administrator
I do find this last line suspicious
kubectl get pod
NAME READY STATUS RESTARTS AGE
pod/elastic-cluster-es-data-nodes-0 0/1 CrashLoopBackOff 9 (2m42s ago) 34m
pod/elastic-cluster-es-data-nodes-1 0/1 CrashLoopBackOff 9 (2m14s ago) 34m
pod/elastic-cluster-es-master-nodes-0 0/1 CrashLoopBackOff 11 (3m1s ago) 34m
pod/elastic-cluster-es-master-nodes-1 0/1 CrashLoopBackOff 11 (3m33s ago) 34m
pod/elastic-cluster-es-master-nodes-2 0/1 CrashLoopBackOff 11 (3m11s ago) 34m
kubectl describe pod/elastic-cluster-es-data-nodes-0
Name: elastic-cluster-es-data-nodes-0
Namespace: default
Priority: 0
Node: yyy/172...
Start Time: Thu, 13 Jan 2022 13:15:35 +0100
Labels: common.k8s.elastic.co/type=elasticsearch
[ … ]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Created 47m kubelet Created container elastic-internal-init-filesystem
Normal Started 47m kubelet Started container elastic-internal-init-filesystem
Normal Pulled 47m kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
Normal Pulled 47m kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
Normal Pulled 47m kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
Normal Created 47m kubelet Created container elastic-internal-suspend
Normal Started 47m kubelet Started container elastic-internal-suspend
Normal Created 47m kubelet Created container sysctl
Normal Started 47m kubelet Started container sysctl
Normal Started 47m kubelet Started container elasticsearch
Normal Scheduled 47m default-scheduler Successfully assigned default/elastic-cluster-es-data-nodes-0 to yyy
Warning Unhealthy 46m kubelet Readiness probe failed: {"timestamp": "2022-01-13T12:15:56+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 46m kubelet Readiness probe failed: {"timestamp": "2022-01-13T12:16:00+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 46m kubelet Readiness probe failed: {"timestamp": "2022-01-13T12:16:05+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 46m kubelet Readiness probe failed: {"timestamp": "2022-01-13T12:16:10+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 46m kubelet Readiness probe failed: {"timestamp": "2022-01-13T12:16:15+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 46m kubelet Readiness probe failed: {"timestamp": "2022-01-13T12:16:20+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 46m kubelet Readiness probe failed: {"timestamp": "2022-01-13T12:16:25+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 46m kubelet Readiness probe failed: {"timestamp": "2022-01-13T12:16:30+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 46m kubelet Readiness probe failed: {"timestamp": "2022-01-13T12:16:35+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Normal Created 45m (x2 over 47m) kubelet Created container elasticsearch
Normal Pulled 45m (x2 over 47m) kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
Warning Unhealthy 22m (x52 over 46m) kubelet (combined from similar events): Readiness probe failed: {"timestamp": "2022-01-13T12:40:40+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning BackOff 2m5s (x159 over 44m) kubelet Back-off restarting failed container
kubectl logs pod/elastic-cluster-es-data-nodes-0
{"type": "server", "timestamp": "2022-01-13T13:11:20,703Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "version[7.16.2], pid[8], build[default/docker/2b937c44140b6559905130a8650c64dbd0879cfb/2021-12-18T19:42:46.604893745Z], OS[Linux/5.10.0-9-amd64/amd64], JVM[Eclipse Adoptium/OpenJDK 64-Bit Server VM/17.0.1/17.0.1+12]" }
{"type": "server", "timestamp": "2022-01-13T13:11:20,710Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "JVM home [/usr/share/elasticsearch/jdk], using bundled JDK [true]" }
{"type": "server", "timestamp": "2022-01-13T13:11:20,710Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "JVM arguments [-Xshare:auto, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j2.formatMsgNoLookups=true, -Djava.locale.providers=SPI,COMPAT, --add-opens=java.base/java.io=ALL-UNNAMED, -XX:+UseG1GC, -Djava.io.tmpdir=/tmp/elasticsearch-15233083869961611136, -XX:+HeapDumpOnOutOfMemoryError, -XX:+ExitOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Des.cgroups.hierarchy.override=/, -Xms1024m, -Xmx1024m, -XX:MaxDirectMemorySize=536870912, -XX:G1HeapRegionSize=4m, -XX:InitiatingHeapOccupancyPercent=30, -XX:G1ReservePercent=15, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=docker, -Des.bundled_jdk=true]" }
{"type": "server", "timestamp": "2022-01-13T13:11:22,598Z", "level": "INFO", "component": "o.e.p.PluginsService", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "loaded module [aggs-matrix-stats]" }
[ … ]
{"type": "server", "timestamp": "2022-01-13T13:11:29,400Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "starting ..." }
{"type": "server", "timestamp": "2022-01-13T13:11:29,415Z", "level": "INFO", "component": "o.e.x.s.c.f.PersistentCache", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "persistent cache index loaded" }
{"type": "server", "timestamp": "2022-01-13T13:11:29,416Z", "level": "INFO", "component": "o.e.x.d.l.DeprecationIndexingComponent", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "deprecation component started" }
{"type": "server", "timestamp": "2022-01-13T13:11:29,502Z", "level": "INFO", "component": "o.e.t.TransportService", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "publish_address {10.244.2.76:9300}, bound_addresses {0.0.0.0:9300}" }
{"type": "server", "timestamp": "2022-01-13T13:11:29,633Z", "level": "INFO", "component": "o.e.b.BootstrapChecks", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "bound or publishing to a non-loopback address, enforcing bootstrap checks" }
{"timestamp": "2022-01-13T13:11:30+00:00", "message": "readiness probe failed", "curl_rc": "7"}
{"timestamp": "2022-01-13T13:11:35+00:00", "message": "readiness probe failed", "curl_rc": "7"}
{"type": "server", "timestamp": "2022-01-13T13:11:39,647Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "master not discovered yet: have discovered [{elastic-cluster-es-data-nodes-0}{SGQL2ijrTJKD-ilvZyOPGA}{GHzvFzJNQk-7ZEjcSOCVxA}{10.244.2.76}{10.244.2.76:9300}{d}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.244.1.92:9300, 10.244.1.93:9300, 10.244.2.74:9300] from hosts providers and [] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
[…]
{"type": "server", "timestamp": "2022-01-13T13:12:19,692Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "uncaught exception in thread [main]",
"stacktrace": ["org.elasticsearch.bootstrap.StartupException: BindTransportException[Failed to resolve publish address]; nested: UnknownHostException[elastic-cluster-es-data-nodes-0.elastic-cluster-es-data-nodes.default.svc: Temporary failure in name resolution];",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:157) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112) ~[elasticsearch-cli-7.16.2.jar:7.16.2]",
"at org.elasticsearch.cli.Command.main(Command.java:77) ~[elasticsearch-cli-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:122) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80) ~[elasticsearch-7.16.2.jar:7.16.2]",
"Caused by: org.elasticsearch.transport.BindTransportException: Failed to resolve publish address",
"at org.elasticsearch.http.AbstractHttpServerTransport.bindServer(AbstractHttpServerTransport.java:170) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.http.netty4.Netty4HttpServerTransport.doStart(Netty4HttpServerTransport.java:255) ~[?:?]",
"at org.elasticsearch.xpack.security.transport.netty4.SecurityNetty4HttpServerTransport.doStart(SecurityNetty4HttpServerTransport.java:78) ~[?:?]",
"at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:48) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.node.Node.start(Node.java:1267) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Bootstrap.start(Bootstrap.java:335) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:443) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:166) ~[elasticsearch-7.16.2.jar:7.16.2]",
"... 6 more",
"Caused by: java.net.UnknownHostException: elastic-cluster-es-data-nodes-0.elastic-cluster-es-data-nodes.default.svc: Temporary failure in name resolution",
[…]
For complete error details, refer to the log at /usr/share/elasticsearch/logs/elastic-cluster.log
{"type": "server", "timestamp": "2022-01-13T13:12:19,784Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "stopping ..." }
{"type": "server", "timestamp": "2022-01-13T13:12:19,787Z", "level": "INFO", "component": "o.e.x.w.WatcherService", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "stopping watch service, reason [shutdown initiated]" }
{"type": "server", "timestamp": "2022-01-13T13:12:19,788Z", "level": "INFO", "component": "o.e.x.m.p.l.CppLogMessageHandler", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "[controller/181] [Main.cc@174] ML controller exiting" }
{"type": "server", "timestamp": "2022-01-13T13:12:19,789Z", "level": "INFO", "component": "o.e.x.m.p.NativeController", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "Native controller process has stopped - no new native processes can be started" }
{"type": "server", "timestamp": "2022-01-13T13:12:19,791Z", "level": "INFO", "component": "o.e.x.w.WatcherLifeCycleService", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "watcher has stopped and shutdown" }
{"type": "server", "timestamp": "2022-01-13T13:12:20,159Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "stopped" }
{"type": "server", "timestamp": "2022-01-13T13:12:20,159Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "closing ..." }
{"type": "server", "timestamp": "2022-01-13T13:12:20,184Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "closed" }
kubectl describe pod/elastic-cluster-es-master-nodes-0
Name: elastic-cluster-es-master-nodes-0
Namespace: default
Priority: 0
Node: yyy/172...
Start Time: Thu, 13 Jan 2022 13:15:33 +0100
Labels: common.k8s.elastic.co/type=elasticsearch
[…]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 49m kubelet MountVolume.SetUp failed for volume "elastic-internal-http-certificates" : failed to sync secret cache: timed out waiting for the condition
Normal Pulled 49m kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
Normal Created 49m kubelet Created container elastic-internal-init-filesystem
Normal Started 49m kubelet Started container elastic-internal-init-filesystem
Normal Pulled 49m kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
Normal Created 48m kubelet Created container elastic-internal-suspend
Normal Started 48m kubelet Started container elastic-internal-suspend
Normal Pulled 48m kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
Normal Created 48m kubelet Created container sysctl
Normal Started 48m kubelet Started container sysctl
Normal Scheduled 48m default-scheduler Successfully assigned default/elastic-cluster-es-master-nodes-0 to yyy
Normal Pulled 47m (x4 over 48m) kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
Normal Created 47m (x4 over 48m) kubelet Created container elasticsearch
Normal Started 47m (x4 over 48m) kubelet Started container elasticsearch
Warning BackOff 3m56s (x209 over 48m) kubelet Back-off restarting failed container
kubectl logs pod/elastic-cluster-es-master-nodes-0
[void]
kubectl get elastic
NAME HEALTH NODES VERSION PHASE AGE
elasticsearch.elasticsearch.k8s.elastic.co/elastic-cluster unknown 7.16.2 ApplyingChanges 35m
kubectl describe elastic
[ …]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unexpected 48m elasticsearch-controller Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:57524->10.96.0.10:53: i/o timeout
Warning Unexpected 47m elasticsearch-controller Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:60690->10.96.0.10:53: i/o timeout
Warning Unexpected 46m elasticsearch-controller Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:41061->10.96.0.10:53: i/o timeout
Warning Unexpected 44m elasticsearch-controller Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:37433->10.96.0.10:53: i/o timeout
Warning Unexpected 42m elasticsearch-controller Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:37974->10.96.0.10:53: i/o timeout
Warning Unexpected 40m elasticsearch-controller Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:49575->10.96.0.10:53: i/o timeout
Warning Unexpected 36m elasticsearch-controller Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:52487->10.96.0.10:53: i/o timeout
Warning Unexpected 30m elasticsearch-controller Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:48853->10.96.0.10:53: i/o timeout
Warning Unexpected 24m elasticsearch-controller Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:39244->10.96.0.10:53: i/o timeout
Warning Unexpected 5m52s (x3 over 18m) elasticsearch-controller (combined from similar events): Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:42730->10.96.0.10:53: i/o timeout
This might be noteworthdy:
kubectl -n elastic-system logs -f statefulset.apps/elastic-operator
{"type": "server", "timestamp": "2022-01-13T13:11:59,648Z", "level": "WARN", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "timed out while waiting for initial discovery state - timeout: 30s" }
{"type": "server", "timestamp": "2022-01-13T13:11:59,652Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "master not discovered yet: have discovered [{elastic-cluster-es-data-nodes-0}{SGQL2ijrTJKD-ilvZyOPGA}{GHzvFzJNQk-7ZEjcSOCVxA}{10.244.2.76}{10.244.2.76:9300}{d}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.244.1.92:9300, 10.244.1.93:9300, 10.244.2.74:9300] from hosts providers and [] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
{"type": "server", "timestamp": "2022-01-13T13:12:19,692Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "uncaught exception in thread [main]",
"stacktrace": ["org.elasticsearch.bootstrap.StartupException: BindTransportException[Failed to resolve publish address]; nested: UnknownHostException[elastic-cluster-es-data-nodes-0.elastic-cluster-es-data-nodes.default.svc: Temporary failure in name resolution];",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:157) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112) ~[elasticsearch-cli-7.16.2.jar:7.16.2]",
"at org.elasticsearch.cli.Command.main(Command.java:77) ~[elasticsearch-cli-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:122) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80) ~[elasticsearch-7.16.2.jar:7.16.2]",
"Caused by: org.elasticsearch.transport.BindTransportException: Failed to resolve publish address",
…