CrashLoopBackOff on ECK startup with Dynamic Local Persistant Volume via OpenEBS

Hello,

We have a :wheel_of_dharma: cluster with

  • 1 master ( 1 core, 4g memory )
  • 2 worknodes ( 2 core, 8g memory )

When starting my data & master nodes resut in a CrashLoopBackOff.

I've been trying to debug the issue, but got stuck.
If someone might point me in the right direction I'de be much obliged :pray:

The master & data node yml:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elastic-cluster
spec:
  version: 7.16.2
  nodeSets:
  - name: master-nodes
    count: 3
    config:
      node.roles: ["master"]
      xpack.ml.enabled: true
    podTemplate:
      spec:
        securityContext:
          runAsUser: 1099
          fsGroup: 1099
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
        containers:
        - name: elasticsearch
          env:
          - name: PRE_STOP_ADDITIONAL_WAIT_SECONDS
            value: "5"
          - name: ES_JAVA_OPTS
            value: "-Xms3g -Xmx3g"
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
        storageClassName: local-hostpath
  - name: data-nodes
    count: 2
    config:
      node.roles: ["data"]
    podTemplate:
      spec:
        securityContext:
          runAsUser: 1099
          fsGroup: 1099
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 25Gi
        storageClassName: local-hostpath

The storage class

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-hostpath
  annotations:
    openebs.io/cas-type: local
    cas.openebs.io/config: |
      - name: StorageType
        value: hostpath
      - name: BasePath
        value: /mnt/data
provisioner: openebs.io/local
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

After applying the storage class and the elastic, the appropriate volumes, PV's, PVC's and pods are created.

Some logs:

kubectl get pv

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                          STORAGECLASS     REASON   AGE
persistentvolume/pvc-0f2552a9-d434-4bfb-a7f1-b3bf734a1e35   1Gi        RWO            Delete           Bound    default/elasticsearch-data-elastic-cluster-es-master-nodes-2   local-hostpath            34m
persistentvolume/pvc-42c12e65-33f5-4168-96eb-77d53644eff2   1Gi        RWO            Delete           Bound    default/elasticsearch-data-elastic-cluster-es-master-nodes-1   local-hostpath            34m
persistentvolume/pvc-bf38cd11-e8c5-4c91-bf4c-2e688f1521d0   25Gi       RWO            Delete           Bound    default/elasticsearch-data-elastic-cluster-es-data-nodes-1     local-hostpath            34m
persistentvolume/pvc-cc6d1b0c-d707-4a50-80cc-131e7ea29cd5   25Gi       RWO            Delete           Bound    default/elasticsearch-data-elastic-cluster-es-data-nodes-0     local-hostpath            34m
persistentvolume/pvc-f886ea88-8d73-499b-8ae9-673eb3154b08   1Gi        RWO            Delete           Bound    default/elasticsearch-data-elastic-cluster-es-master-nodes-0   local-hostpath            34m

xxx: ls -la /mnt/data

drwxrwsrwx 2 root zzz  4096 Jan 13 13:15 pvc-0f2552a9-d434-4bfb-a7f1-b3bf734a1e35
drwxrwsrwx 2 root zzz  4096 Jan 13 13:15 pvc-42c12e65-33f5-4168-96eb-77d53644eff2

yyy: ls -la /mnt/data

drwxrwsrwx 3 root zzz  4096 Jan 13 13:16 pvc-bf38cd11-e8c5-4c91-bf4c-2e688f1521d0
drwxrwsrwx 3 root zzz  4096 Jan 13 13:15 pvc-cc6d1b0c-d707-4a50-80cc-131e7ea29cd5
drwxrwsrwx 2 root zzz  4096 Jan 13 13:15 pvc-f886ea88-8d73-499b-8ae9-673eb3154b08

kubectl get pvc

NAME                                                                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-data-nodes-0     Bound    pvc-cc6d1b0c-d707-4a50-80cc-131e7ea29cd5   25Gi       RWO            local-hostpath   34m
persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-data-nodes-1     Bound    pvc-bf38cd11-e8c5-4c91-bf4c-2e688f1521d0   25Gi       RWO            local-hostpath   34m
persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-master-nodes-0   Bound    pvc-f886ea88-8d73-499b-8ae9-673eb3154b08   1Gi        RWO            local-hostpath   34m
persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-master-nodes-1   Bound    pvc-42c12e65-33f5-4168-96eb-77d53644eff2   1Gi        RWO            local-hostpath   34m
persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-master-nodes-2   Bound    pvc-0f2552a9-d434-4bfb-a7f1-b3bf734a1e35   1Gi        RWO            local-hostpath   34m

kubectl describe persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-data-nodes-0

Name:          elasticsearch-data-elastic-cluster-es-data-nodes-0
Namespace:     default
StorageClass:  local-hostpath
Status:        Bound
Volume:        pvc-cc6d1b0c-d707-4a50-80cc-131e7ea29cd5
Labels:        common.k8s.elastic.co/type=elasticsearch
               elasticsearch.k8s.elastic.co/cluster-name=elastic-cluster
               elasticsearch.k8s.elastic.co/statefulset-name=elastic-cluster-es-data-nodes
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: openebs.io/local
               volume.kubernetes.io/selected-node: yyy
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      25Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       elastic-cluster-es-data-nodes-0
Events:
  Type    Reason                 Age                From                                                                                                Message
  ----    ------                 ----               ----                                                                                                -------
  Normal  Provisioning           43m                openebs.io/local_openebs-localpv-provisioner-6756f57d65-fbvbf_002b1b0d-c0a5-4b93-9e48-4c8bf3884ab3  External provisioner is provisioning volume for claim "default/elasticsearch-data-elastic-cluster-es-data-nodes-0"
  Normal  ProvisioningSucceeded  43m                openebs.io/local_openebs-localpv-provisioner-6756f57d65-fbvbf_002b1b0d-c0a5-4b93-9e48-4c8bf3884ab3  Successfully provisioned volume pvc-cc6d1b0c-d707-4a50-80cc-131e7ea29cd5
  Normal  WaitForFirstConsumer   43m                persistentvolume-controller                                                                         waiting for first consumer to be created before binding
  Normal  ExternalProvisioning   43m (x4 over 43m)  persistentvolume-controller                                                                         waiting for a volume to be created, either by external provisioner "openebs.io/local" or manually created by system administrator

I do find this last line suspicious

kubectl describe persistentvolumeclaim/elasticsearch-data-elastic-cluster-es-master-nodes-0

Name:          elasticsearch-data-elastic-cluster-es-master-nodes-0
Namespace:     default
StorageClass:  local-hostpath
Status:        Bound
Volume:        pvc-f886ea88-8d73-499b-8ae9-673eb3154b08
Labels:        common.k8s.elastic.co/type=elasticsearch
               elasticsearch.k8s.elastic.co/cluster-name=elastic-cluster
               elasticsearch.k8s.elastic.co/statefulset-name=elastic-cluster-es-master-nodes
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: openebs.io/local
               volume.kubernetes.io/selected-node: yyy
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       elastic-cluster-es-master-nodes-0
Events:
  Type    Reason                 Age                From                                                                                                Message
  ----    ------                 ----               ----                                                                                                -------
  Normal  Provisioning           45m                openebs.io/local_openebs-localpv-provisioner-6756f57d65-fbvbf_002b1b0d-c0a5-4b93-9e48-4c8bf3884ab3  External provisioner is provisioning volume for claim "default/elasticsearch-data-elastic-cluster-es-master-nodes-0"
  Normal  ProvisioningSucceeded  45m                openebs.io/local_openebs-localpv-provisioner-6756f57d65-fbvbf_002b1b0d-c0a5-4b93-9e48-4c8bf3884ab3  Successfully provisioned volume pvc-f886ea88-8d73-499b-8ae9-673eb3154b08
  Normal  WaitForFirstConsumer   44m                persistentvolume-controller                                                                         waiting for first consumer to be created before binding
  Normal  ExternalProvisioning   44m (x3 over 44m)  persistentvolume-controller                                                                         waiting for a volume to be created, either by external provisioner "openebs.io/local" or manually created by system administrator

I do find this last line suspicious


kubectl get pod

NAME                                    READY   STATUS             RESTARTS         AGE
pod/elastic-cluster-es-data-nodes-0     0/1     CrashLoopBackOff   9 (2m42s ago)    34m
pod/elastic-cluster-es-data-nodes-1     0/1     CrashLoopBackOff   9 (2m14s ago)    34m
pod/elastic-cluster-es-master-nodes-0   0/1     CrashLoopBackOff   11 (3m1s ago)    34m
pod/elastic-cluster-es-master-nodes-1   0/1     CrashLoopBackOff   11 (3m33s ago)   34m
pod/elastic-cluster-es-master-nodes-2   0/1     CrashLoopBackOff   11 (3m11s ago)   34m

kubectl describe pod/elastic-cluster-es-data-nodes-0

Name:         elastic-cluster-es-data-nodes-0
Namespace:    default
Priority:     0
Node:         yyy/172...
Start Time:   Thu, 13 Jan 2022 13:15:35 +0100
Labels:       common.k8s.elastic.co/type=elasticsearch
 [ … ]
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Created    47m                   kubelet            Created container elastic-internal-init-filesystem
  Normal   Started    47m                   kubelet            Started container elastic-internal-init-filesystem
  Normal   Pulled     47m                   kubelet            Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
  Normal   Pulled     47m                   kubelet            Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
  Normal   Pulled     47m                   kubelet            Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
  Normal   Created    47m                   kubelet            Created container elastic-internal-suspend
  Normal   Started    47m                   kubelet            Started container elastic-internal-suspend
  Normal   Created    47m                   kubelet            Created container sysctl
  Normal   Started    47m                   kubelet            Started container sysctl
  Normal   Started    47m                   kubelet            Started container elasticsearch
  Normal   Scheduled  47m                   default-scheduler  Successfully assigned default/elastic-cluster-es-data-nodes-0 to yyy
  Warning  Unhealthy  46m                   kubelet            Readiness probe failed: {"timestamp": "2022-01-13T12:15:56+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy  46m                   kubelet            Readiness probe failed: {"timestamp": "2022-01-13T12:16:00+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy  46m                   kubelet            Readiness probe failed: {"timestamp": "2022-01-13T12:16:05+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy  46m                   kubelet            Readiness probe failed: {"timestamp": "2022-01-13T12:16:10+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy  46m                   kubelet            Readiness probe failed: {"timestamp": "2022-01-13T12:16:15+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy  46m                   kubelet            Readiness probe failed: {"timestamp": "2022-01-13T12:16:20+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy  46m                   kubelet            Readiness probe failed: {"timestamp": "2022-01-13T12:16:25+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy  46m                   kubelet            Readiness probe failed: {"timestamp": "2022-01-13T12:16:30+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy  46m                   kubelet            Readiness probe failed: {"timestamp": "2022-01-13T12:16:35+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Normal   Created    45m (x2 over 47m)     kubelet            Created container elasticsearch
  Normal   Pulled     45m (x2 over 47m)     kubelet            Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
  Warning  Unhealthy  22m (x52 over 46m)    kubelet            (combined from similar events): Readiness probe failed: {"timestamp": "2022-01-13T12:40:40+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  BackOff    2m5s (x159 over 44m)  kubelet            Back-off restarting failed container

kubectl logs pod/elastic-cluster-es-data-nodes-0

{"type": "server", "timestamp": "2022-01-13T13:11:20,703Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "version[7.16.2], pid[8], build[default/docker/2b937c44140b6559905130a8650c64dbd0879cfb/2021-12-18T19:42:46.604893745Z], OS[Linux/5.10.0-9-amd64/amd64], JVM[Eclipse Adoptium/OpenJDK 64-Bit Server VM/17.0.1/17.0.1+12]" }
{"type": "server", "timestamp": "2022-01-13T13:11:20,710Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "JVM home [/usr/share/elasticsearch/jdk], using bundled JDK [true]" }
{"type": "server", "timestamp": "2022-01-13T13:11:20,710Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "JVM arguments [-Xshare:auto, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j2.formatMsgNoLookups=true, -Djava.locale.providers=SPI,COMPAT, --add-opens=java.base/java.io=ALL-UNNAMED, -XX:+UseG1GC, -Djava.io.tmpdir=/tmp/elasticsearch-15233083869961611136, -XX:+HeapDumpOnOutOfMemoryError, -XX:+ExitOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Des.cgroups.hierarchy.override=/, -Xms1024m, -Xmx1024m, -XX:MaxDirectMemorySize=536870912, -XX:G1HeapRegionSize=4m, -XX:InitiatingHeapOccupancyPercent=30, -XX:G1ReservePercent=15, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=docker, -Des.bundled_jdk=true]" }
{"type": "server", "timestamp": "2022-01-13T13:11:22,598Z", "level": "INFO", "component": "o.e.p.PluginsService", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "loaded module [aggs-matrix-stats]" }
[ … ]
{"type": "server", "timestamp": "2022-01-13T13:11:29,400Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "starting ..." }
{"type": "server", "timestamp": "2022-01-13T13:11:29,415Z", "level": "INFO", "component": "o.e.x.s.c.f.PersistentCache", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "persistent cache index loaded" }
{"type": "server", "timestamp": "2022-01-13T13:11:29,416Z", "level": "INFO", "component": "o.e.x.d.l.DeprecationIndexingComponent", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "deprecation component started" }
{"type": "server", "timestamp": "2022-01-13T13:11:29,502Z", "level": "INFO", "component": "o.e.t.TransportService", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "publish_address {10.244.2.76:9300}, bound_addresses {0.0.0.0:9300}" }
{"type": "server", "timestamp": "2022-01-13T13:11:29,633Z", "level": "INFO", "component": "o.e.b.BootstrapChecks", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "bound or publishing to a non-loopback address, enforcing bootstrap checks" }
{"timestamp": "2022-01-13T13:11:30+00:00", "message": "readiness probe failed", "curl_rc": "7"}
{"timestamp": "2022-01-13T13:11:35+00:00", "message": "readiness probe failed", "curl_rc": "7"}
{"type": "server", "timestamp": "2022-01-13T13:11:39,647Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "master not discovered yet: have discovered [{elastic-cluster-es-data-nodes-0}{SGQL2ijrTJKD-ilvZyOPGA}{GHzvFzJNQk-7ZEjcSOCVxA}{10.244.2.76}{10.244.2.76:9300}{d}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.244.1.92:9300, 10.244.1.93:9300, 10.244.2.74:9300] from hosts providers and [] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
[…]
{"type": "server", "timestamp": "2022-01-13T13:12:19,692Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "uncaught exception in thread [main]",
"stacktrace": ["org.elasticsearch.bootstrap.StartupException: BindTransportException[Failed to resolve publish address]; nested: UnknownHostException[elastic-cluster-es-data-nodes-0.elastic-cluster-es-data-nodes.default.svc: Temporary failure in name resolution];",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:157) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112) ~[elasticsearch-cli-7.16.2.jar:7.16.2]",
"at org.elasticsearch.cli.Command.main(Command.java:77) ~[elasticsearch-cli-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:122) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80) ~[elasticsearch-7.16.2.jar:7.16.2]",
"Caused by: org.elasticsearch.transport.BindTransportException: Failed to resolve publish address",
"at org.elasticsearch.http.AbstractHttpServerTransport.bindServer(AbstractHttpServerTransport.java:170) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.http.netty4.Netty4HttpServerTransport.doStart(Netty4HttpServerTransport.java:255) ~[?:?]",
"at org.elasticsearch.xpack.security.transport.netty4.SecurityNetty4HttpServerTransport.doStart(SecurityNetty4HttpServerTransport.java:78) ~[?:?]",
"at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:48) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.node.Node.start(Node.java:1267) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Bootstrap.start(Bootstrap.java:335) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:443) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:166) ~[elasticsearch-7.16.2.jar:7.16.2]",
"... 6 more",
"Caused by: java.net.UnknownHostException: elastic-cluster-es-data-nodes-0.elastic-cluster-es-data-nodes.default.svc: Temporary failure in name resolution",
[…]
For complete error details, refer to the log at /usr/share/elasticsearch/logs/elastic-cluster.log
{"type": "server", "timestamp": "2022-01-13T13:12:19,784Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "stopping ..." }
{"type": "server", "timestamp": "2022-01-13T13:12:19,787Z", "level": "INFO", "component": "o.e.x.w.WatcherService", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "stopping watch service, reason [shutdown initiated]" }
{"type": "server", "timestamp": "2022-01-13T13:12:19,788Z", "level": "INFO", "component": "o.e.x.m.p.l.CppLogMessageHandler", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "[controller/181] [Main.cc@174] ML controller exiting" }
{"type": "server", "timestamp": "2022-01-13T13:12:19,789Z", "level": "INFO", "component": "o.e.x.m.p.NativeController", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "Native controller process has stopped - no new native processes can be started" }
{"type": "server", "timestamp": "2022-01-13T13:12:19,791Z", "level": "INFO", "component": "o.e.x.w.WatcherLifeCycleService", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "watcher has stopped and shutdown" }
{"type": "server", "timestamp": "2022-01-13T13:12:20,159Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "stopped" }
{"type": "server", "timestamp": "2022-01-13T13:12:20,159Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "closing ..." }
{"type": "server", "timestamp": "2022-01-13T13:12:20,184Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "closed" }

kubectl describe pod/elastic-cluster-es-master-nodes-0

Name:         elastic-cluster-es-master-nodes-0
Namespace:    default
Priority:     0
Node:         yyy/172...
Start Time:   Thu, 13 Jan 2022 13:15:33 +0100
Labels:       common.k8s.elastic.co/type=elasticsearch
[…]
Events:
  Type     Reason       Age                    From               Message
  ----     ------       ----                   ----               -------
  Warning  FailedMount  49m                    kubelet            MountVolume.SetUp failed for volume "elastic-internal-http-certificates" : failed to sync secret cache: timed out waiting for the condition
  Normal   Pulled       49m                    kubelet            Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
  Normal   Created      49m                    kubelet            Created container elastic-internal-init-filesystem
  Normal   Started      49m                    kubelet            Started container elastic-internal-init-filesystem
  Normal   Pulled       49m                    kubelet            Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
  Normal   Created      48m                    kubelet            Created container elastic-internal-suspend
  Normal   Started      48m                    kubelet            Started container elastic-internal-suspend
  Normal   Pulled       48m                    kubelet            Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
  Normal   Created      48m                    kubelet            Created container sysctl
  Normal   Started      48m                    kubelet            Started container sysctl
  Normal   Scheduled    48m                    default-scheduler  Successfully assigned default/elastic-cluster-es-master-nodes-0 to yyy
  Normal   Pulled       47m (x4 over 48m)      kubelet            Container image "docker.elastic.co/elasticsearch/elasticsearch:7.16.2" already present on machine
  Normal   Created      47m (x4 over 48m)      kubelet            Created container elasticsearch
  Normal   Started      47m (x4 over 48m)      kubelet            Started container elasticsearch
  Warning  BackOff      3m56s (x209 over 48m)  kubelet            Back-off restarting failed container

kubectl logs pod/elastic-cluster-es-master-nodes-0
[void]


kubectl get elastic

NAME                                                         HEALTH    NODES   VERSION   PHASE             AGE
elasticsearch.elasticsearch.k8s.elastic.co/elastic-cluster   unknown           7.16.2    ApplyingChanges   35m

kubectl describe elastic

[ …] 
Events:
  Type     Reason      Age                  From                      Message
  ----     ------      ----                 ----                      -------
  Warning  Unexpected  48m                  elasticsearch-controller  Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:57524->10.96.0.10:53: i/o timeout
  Warning  Unexpected  47m                  elasticsearch-controller  Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:60690->10.96.0.10:53: i/o timeout
  Warning  Unexpected  46m                  elasticsearch-controller  Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:41061->10.96.0.10:53: i/o timeout
  Warning  Unexpected  44m                  elasticsearch-controller  Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:37433->10.96.0.10:53: i/o timeout
  Warning  Unexpected  42m                  elasticsearch-controller  Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:37974->10.96.0.10:53: i/o timeout
  Warning  Unexpected  40m                  elasticsearch-controller  Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:49575->10.96.0.10:53: i/o timeout
  Warning  Unexpected  36m                  elasticsearch-controller  Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:52487->10.96.0.10:53: i/o timeout
  Warning  Unexpected  30m                  elasticsearch-controller  Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:48853->10.96.0.10:53: i/o timeout
  Warning  Unexpected  24m                  elasticsearch-controller  Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:39244->10.96.0.10:53: i/o timeout
  Warning  Unexpected  5m52s (x3 over 18m)  elasticsearch-controller  (combined from similar events): Could not verify license, re-queuing: elasticsearch client failed for https://elastic-cluster-es-http.default.svc:9200/_license: Get "https://elastic-cluster-es-http.default.svc:9200/_license": dial tcp: lookup elastic-cluster-es-http.default.svc on 10.96.0.10:53: read udp 10.244.2.2:42730->10.96.0.10:53: i/o timeout

This might be noteworthdy:

kubectl -n elastic-system logs -f statefulset.apps/elastic-operator

{"type": "server", "timestamp": "2022-01-13T13:11:59,648Z", "level": "WARN", "component": "o.e.n.Node", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "timed out while waiting for initial discovery state - timeout: 30s" }
{"type": "server", "timestamp": "2022-01-13T13:11:59,652Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "master not discovered yet: have discovered [{elastic-cluster-es-data-nodes-0}{SGQL2ijrTJKD-ilvZyOPGA}{GHzvFzJNQk-7ZEjcSOCVxA}{10.244.2.76}{10.244.2.76:9300}{d}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.244.1.92:9300, 10.244.1.93:9300, 10.244.2.74:9300] from hosts providers and [] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
{"type": "server", "timestamp": "2022-01-13T13:12:19,692Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "elastic-cluster", "node.name": "elastic-cluster-es-data-nodes-0", "message": "uncaught exception in thread [main]", 
"stacktrace": ["org.elasticsearch.bootstrap.StartupException: BindTransportException[Failed to resolve publish address]; nested: UnknownHostException[elastic-cluster-es-data-nodes-0.elastic-cluster-es-data-nodes.default.svc: Temporary failure in name resolution];",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:157) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112) ~[elasticsearch-cli-7.16.2.jar:7.16.2]",
"at org.elasticsearch.cli.Command.main(Command.java:77) ~[elasticsearch-cli-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:122) ~[elasticsearch-7.16.2.jar:7.16.2]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80) ~[elasticsearch-7.16.2.jar:7.16.2]",
"Caused by: org.elasticsearch.transport.BindTransportException: Failed to resolve publish address",
…
2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.