Elasticsearch ECK in CrashLoopBackoff after node failure

Hi all,

We have an ECK cluster v8.13.2 running on Kubernetes v1.29.2.

The cluster has dedicated hot and warm data nodes. We have completed the load test and are now at the stage where we want to test recovery from failure. After pulling the cord from a kubernetes nodes, 3 pods were unavailable

  • logstgash
  • elasticsearch-master
  • elasticsearch-data-hot

After starting the node, logstash recovered but both elasticsearch pods went into a crashLoopBackoff. From what we can tell, the pods are not restarted and the container that is in crashloop on both pods is elastic-internal-init-filesystem. The following are the logs we get on the failing container

Starting init script
Copying /usr/share/elasticsearch/config/* to /mnt/elastic-internal/elasticsearch-config-local/
'/usr/share/elasticsearch/config/elasticsearch-plugins.example.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/elasticsearch-plugins.example.yml'
'/usr/share/elasticsearch/config/elasticsearch.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/elasticsearch.yml'
'/usr/share/elasticsearch/config/http-certs/..2024_05_27_10_44_08.1291635209/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2024_05_27_10_44_08.1291635209/ca.crt'
'/usr/share/elasticsearch/config/http-certs/..2024_05_27_10_44_08.1291635209/tls.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2024_05_27_10_44_08.1291635209/tls.crt'
'/usr/share/elasticsearch/config/http-certs/..2024_05_27_10_44_08.1291635209/tls.key' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2024_05_27_10_44_08.1291635209/tls.key'
removed '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..data'
'/usr/share/elasticsearch/config/http-certs/..data' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..data'
removed '/mnt/elastic-internal/elasticsearch-config-local/http-certs/ca.crt'
'/usr/share/elasticsearch/config/http-certs/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/ca.crt'
removed '/mnt/elastic-internal/elasticsearch-config-local/http-certs/tls.crt'
'/usr/share/elasticsearch/config/http-certs/tls.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/tls.crt'
removed '/mnt/elastic-internal/elasticsearch-config-local/http-certs/tls.key'
'/usr/share/elasticsearch/config/http-certs/tls.key' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/tls.key'
'/usr/share/elasticsearch/config/jvm.options' -> '/mnt/elastic-internal/elasticsearch-config-local/jvm.options'
'/usr/share/elasticsearch/config/log4j2.file.properties' -> '/mnt/elastic-internal/elasticsearch-config-local/log4j2.file.properties'
'/usr/share/elasticsearch/config/log4j2.properties' -> '/mnt/elastic-internal/elasticsearch-config-local/log4j2.properties'
cp: preserving times for '/mnt/elastic-internal/elasticsearch-config-local/log4j2.properties': Operation not permitted
'/usr/share/elasticsearch/config/operator/..2024_05_27_10_44_08.3060476595/settings.json' -> '/mnt/elastic-internal/elasticsearch-config-local/operator/..2024_05_27_10_44_08.3060476595/settings.json'
removed '/mnt/elastic-internal/elasticsearch-config-local/operator/..data'
'/usr/share/elasticsearch/config/operator/..data' -> '/mnt/elastic-internal/elasticsearch-config-local/operator/..data'
removed '/mnt/elastic-internal/elasticsearch-config-local/operator/settings.json'
'/usr/share/elasticsearch/config/operator/settings.json' -> '/mnt/elastic-internal/elasticsearch-config-local/operator/settings.json'
'/usr/share/elasticsearch/config/role_mapping.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/role_mapping.yml'
'/usr/share/elasticsearch/config/roles.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/roles.yml'
'/usr/share/elasticsearch/config/transport-remote-certs/..2024_05_27_10_44_08.2100192238/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/..2024_05_27_10_44_08.2100192238/ca.crt'
removed '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/..data'
'/usr/share/elasticsearch/config/transport-remote-certs/..data' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/..data'
removed '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/ca.crt'
'/usr/share/elasticsearch/config/transport-remote-certs/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/ca.crt'
'/usr/share/elasticsearch/config/users' -> '/mnt/elastic-internal/elasticsearch-config-local/users'
'/usr/share/elasticsearch/config/users_roles' -> '/mnt/elastic-internal/elasticsearch-config-local/users_roles'

If we delete the pods, they restart properly and the elasticsearch cluster goes back into a healthy state.

Has anyone experienced something similar? Are we doing something wrong?

The following is our elasticsearch poc config

---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
  namespace: elk-system
  labels:
    app: elasticsearch
    environment: dev
spec:
  version: 8.13.2
  volumeClaimDeletePolicy: DeleteOnScaledownOnly
  auth:
    roles:
      - secretName: logstash-kafka-role
  monitoring:
    metrics:
      elasticsearchRefs:
      - name: elasticsearch
    logs:
      elasticsearchRefs:
      - name: elasticsearch
  nodeSets:
  - name: cluster
    count: 3
    config:
      cluster.routing.allocation.disk.watermark.low: "98%"
      cluster.routing.allocation.disk.watermark.high: "99%"
      cluster.routing.allocation.disk.watermark.flood_stage: "99%"
      node.roles: ["master", "ingest"]
      xpack.ml.enabled: false
      ingest.geoip.downloader.enabled: false
    podTemplate:
      metadata:
        labels:
          app: elasticsearch
          environment: dev
      spec:
        topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/zone
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                app: elasticsearch
                environment: dev
        affinity:
          nodeAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                matchExpressions:
                - key: node-role.kubernetes.io/c8m32
                  operator: Exists
                  values: []
        tolerations:
          - key: c8m32
            effect: NoSchedule
            operator: Exists
        containers:
          - name: elasticsearch
            env:
            - name: ES_JAVA_OPTS
              value: -Xms3g -Xmx3g
            resources:
              requests:
                memory: 4500Mi
                cpu: 1
              limits:
                memory: 4500Mi
                cpu: 1
            securityContext:
              runAsUser: 2000
              runAsGroup: 3000
  - name: data-hot
    count: 2
    config:
      cluster.routing.allocation.disk.watermark.low: "98%"
      cluster.routing.allocation.disk.watermark.high: "99%"
      cluster.routing.allocation.disk.watermark.flood_stage: "99%"
      node.roles: ["data_hot", "data_content"]
      xpack.ml.enabled: false
      ingest.geoip.downloader.enabled: false
    podTemplate:
      metadata:
        labels:
          app: elasticsearch
          environment: dev
      spec:
        topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/zone
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                app: elasticsearch
                environment: dev
        affinity:
          nodeAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                matchExpressions:
                - key: node-role.kubernetes.io/c8m32
                  operator: Exists
                  values: []
        tolerations:
          - key: c8m32
            effect: NoSchedule
            operator: Exists
        containers:
          - name: elasticsearch
            env:
            - name: ES_JAVA_OPTS
              value: -Xms12g -Xmx12g
            resources:
              requests:
                memory: 16Gi
                cpu: 8
              limits:
                memory: 16Gi
                cpu: 8
            securityContext:
              runAsUser: 2000
              runAsGroup: 3000
    volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: elasticsearch-data
        labels:
          app: elasticsearch
          environment: dev
      spec:
        storageClassName: sc-ebs-gp3-xfs
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 300Gi
  - name: data-warm
    count: 2
    config:
      cluster.routing.allocation.disk.watermark.low: "98%"
      cluster.routing.allocation.disk.watermark.high: "99%"
      cluster.routing.allocation.disk.watermark.flood_stage: "99%"
      node.roles: ["data_warm"]
      xpack.ml.enabled: false
      ingest.geoip.downloader.enabled: false
    podTemplate:
      metadata:
        labels:
          app: elasticsearch
          environment: dev
      spec:
        topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/zone
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                app: elasticsearch
                environment: dev
        containers:
          - name: elasticsearch
            env:
            - name: ES_JAVA_OPTS
              value: -Xms5g -Xmx5g
            resources:
              requests:
                memory: 6Gi
                cpu: 2
              limits:
                memory: 6Gi
                cpu: 2
            securityContext:
              runAsUser: 2000
              runAsGroup: 3000
    volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: elasticsearch-data
        labels:
          app: elasticsearch
          environment: dev
      spec:
        storageClassName: sc-ebs-gp3-xfs
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 600Gi

Thank you for any help/advice/doc you can give me.

I thought I'd add a sample of the Kubernetes events I get around that time

LAST SEEN   TYPE      REASON         OBJECT                           MESSAGE
3m6s        Warning   NodeNotReady   Pod/elasticsearch-es-cluster-1   Node is not ready
3m5s        Warning   NodeNotReady   Pod/elasticsearch-es-data-hot-1   Node is not ready
3m36s (x3 over 154m)   Warning   Unhealthy      Elasticsearch/elasticsearch       Elasticsearch cluster health degraded
82s (x2 over 100m)     Warning   Unexpected     Elasticsearch/elasticsearch       Could not verify license, re-queuing: elasticsearch client failed for https://elasticsearch-es-internal-http.elk-system.svc:9200/_license: Get "https://elasticsearch-es-internal-http.elk-system.svc:9200/_license": dial tcp 172.30.121.176:9200: connect: connection timed out
3m6s (x3 over 102m)    Warning   NodeNotReady   Pod/logstash-kafka-ls-2           Node is not ready
0s                     Warning   FailedMount    Pod/elasticsearch-es-cluster-1    MountVolume.MountDevice failed for volume "pvc-85f9bcb1-761b-4cfb-9a7b-3e146fdd6cde" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name ebs.csi.aws.com not found in the list of registered CSI drivers
0s                     Warning   FailedMount    Pod/elasticsearch-es-data-hot-1   MountVolume.MountDevice failed for volume "pvc-747e9951-895f-4bc9-a25a-6b9e7864155e" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name ebs.csi.aws.com not found in the list of registered CSI driversa-9727-43d3963a8a6a" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name ebs.csi.aws.com not found in the list of registered CSI drivers
0s                     Warning   BackOff        Pod/elasticsearch-es-data-hot-1   Back-off restarting failed container elastic-internal-init-filesystem in pod elasticsearch-es-data-hot-1_elk-system(7fff2f38-794f-47d4-8ce8-a7e6b0a5d845)
0s                     Warning   BackOff        Pod/elasticsearch-es-cluster-1    Back-off restarting failed container elastic-internal-init-filesystem in pod elasticsearch-es-cluster-1_elk-system(facf5ed0-cb36-4c97-a341-12f88345543b)
0s (x3 over 99m)       Warning   RecreatingFailedPod   StatefulSet/elasticsearch-es-cluster   StatefulSet elk-system/elasticsearch-es-cluster is recreating failed Pod elasticsearch-es-cluster-1
0s (x10 over 99m)      Warning   RecreatingFailedPod   StatefulSet/elasticsearch-es-data-hot   StatefulSet elk-system/elasticsearch-es-data-hot is recreating failed Pod elasticsearch-es-data-hot-1

I have been running more tests to try and find out what the problem is. It seems that this only happens when the kubernetes worker node is down for a short amount of time.

Should this be filed as a bug?

It seems that if the kubernetes node is down for a short period, Kubernetes tries to recover the pod when it is back into a Ready state. If the node is down for a longer period, it seems kubernetes terminates the pods when it comes into a Ready state.

Best practice is to never allocate more than 50% of available RAM to the heap. Elasticsearch relies on off-heap memory and the OS page cache for optimal performance so you should correct this.

Also note that all master eligible nodes must have persistent storage.

Given that you have relatively little storage on the nodes this looks risky. I would not deviate from the deafults until you have significantly more storage per node.

Thanks for the advice @Christian_Dahlqvist

I will follow your recommendations and give the fail test another go. I'll give my results when I'm done.

Hi @Christian_Dahlqvist,
I have recreated the cluster with your recommendations (also had to shrink it in size).

The problem still seems to persist. If a kubernetes worker node goes down (did a force shutdown of an instance in aws) and I bring it back up soon after, then Logstash recovers but the ES pods do not. I have to manually delete them for them to be properly recreated and for the cluster to become healthy again.

Would you consider this as a bug or a problem with kubernete's pod-eviction-timeout causing this issue?

I also notice that during the time where ES nodes are unavailable, Logstash seems to send duplicates but that's for another time and another post.

Here is my ES setup

---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
  namespace: elk-system
  labels:
    app: elasticsearch
    environment: dev
spec:
  version: 8.13.2
  volumeClaimDeletePolicy: DeleteOnScaledownOnly
  auth:
    roles:
      - secretName: logstash-kafka-role
  monitoring:
    metrics:
      elasticsearchRefs:
      - name: elasticsearch
    logs:
      elasticsearchRefs:
      - name: elasticsearch
  nodeSets:
  - name: cluster
    count: 3
    config:
      node.roles: ["master", "ingest"]
      xpack.ml.enabled: false
      ingest.geoip.downloader.enabled: false
    podTemplate:
      metadata:
        labels:
          app: elasticsearch
          environment: dev
      spec:
        topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/zone
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                app: elasticsearch
                environment: dev
        affinity:
          nodeAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                matchExpressions:
                - key: node-role.kubernetes.io/c8m32
                  operator: Exists
                  values: []
        tolerations:
          - key: c8m32
            effect: NoSchedule
            operator: Exists
        containers:
          - name: elasticsearch
            env:
            - name: ES_JAVA_OPTS
              value: -Xms2g -Xmx2g
            resources:
              requests:
                memory: 4Gi
                cpu: 1
              limits:
                memory: 4Gi
                cpu: 1
            securityContext:
              runAsUser: 2000
              runAsGroup: 3000
    volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: elasticsearch-data
        labels:
          app: elasticsearch
          environment: dev
      spec:
        storageClassName: sc-ebs-gp3-xfs
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
  - name: data-hot
    count: 2
    config:
      node.roles: ["data_hot", "data_content"]
      xpack.ml.enabled: false
      ingest.geoip.downloader.enabled: false
    podTemplate:
      metadata:
        labels:
          app: elasticsearch
          environment: dev
      spec:
        topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/zone
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                app: elasticsearch
                environment: dev
        affinity:
          nodeAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                matchExpressions:
                - key: node-role.kubernetes.io/c8m32
                  operator: Exists
                  values: []
        tolerations:
          - key: c8m32
            effect: NoSchedule
            operator: Exists
        containers:
          - name: elasticsearch
            env:
            - name: ES_JAVA_OPTS
              value: -Xms6g -Xmx6g
            resources:
              requests:
                memory: 12Gi
                cpu: 4
              limits:
                memory: 12Gi
                cpu: 4
            securityContext:
              runAsUser: 2000
              runAsGroup: 3000
    volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: elasticsearch-data
        labels:
          app: elasticsearch
          environment: dev
      spec:
        storageClassName: sc-ebs-gp3-xfs
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 300Gi
  - name: data-warm
    count: 2
    config:
      node.roles: ["data_warm"]
      xpack.ml.enabled: false
      ingest.geoip.downloader.enabled: false
    podTemplate:
      metadata:
        labels:
          app: elasticsearch
          environment: dev
      spec:
        topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/zone
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                app: elasticsearch
                environment: dev
        containers:
          - name: elasticsearch
            env:
            - name: ES_JAVA_OPTS
              value: -Xms3g -Xmx3g
            resources:
              requests:
                memory: 6Gi
                cpu: 2
              limits:
                memory: 6Gi
                cpu: 2
            securityContext:
              runAsUser: 2000
              runAsGroup: 3000
    volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: elasticsearch-data
        labels:
          app: elasticsearch
          environment: dev
      spec:
        storageClassName: sc-ebs-gp3-xfs
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 600Gi