Upgrade of custom-image statefulset causes cluster-wide Readiness probe failed

Zorlack · October 30, 2019, 9:52pm

Hello,

I'm using ECK Operator 1.0.0-beta1 running on Rancher 2.0.

I have a custom image for Elasticsearch which adds an off-cluster NFS share for snapshot backups. This capability works correctly, but when I go to upgrade the cluster (such as from 7.4.0 to 7.4.1) I see the following behavior:

Kubernetes tries to remove the last node in the cluster
This seems to timeout which results in the pod being killed (I think)
Then the entire cluster detects "Readiness Probe Failed" and falls over
The cluster comes back on it's own, and the killed node now has the new version
Repeat for every node in the cluster

No data is lost during this, but the cluster restarts once for every pod.

The Dockerfile looks like this:

FROM docker.elastic.co/elasticsearch/elasticsearch:7.4.1

RUN yum -y install nfs-utils
RUN mkdir /mnt/snapshots
COPY ./my-start.sh /usr/local/bin/my-start.sh
ENTRYPOINT ["/usr/local/bin/my-start.sh"]

The my-start.sh script adds a mount command before sourcing the original entrypoint:

#!/bin/bash
mount -vvv -t nfs -o nolock my-store:/volume/snapshots /mnt/snapshots
/usr/local/bin/docker-entrypoint.sh
umount /mnt/snapshots

I think perhaps the problem is that the umount is never reached due to the sigterm call, but I don't know how to confirm this.

Why would a shutdown timeout of a single instance cause the entire cluster to flap?

Zorlack · October 30, 2019, 10:09pm

I made some progress here using the following:

#!/bin/bash
mount -vvv -t nfs -o nolock my-store:/volume/snapshots /mnt/snapshots
source /usr/local/bin/docker-entrypoint.sh
umount /mnt/snapshots

By sourcing the entrypoint (instead of calling it,) I think the sigterm results in the umount line being reached.

Adding a lifecycle preStop to the container template also seems to prevent the pod from hanging.

lifecycle:
  preStop:
    exec:
      command: ["/usr/bin/umount", "/mnt/snapshots" ]

Is one of these approaches better?

sebgl · October 31, 2019, 9:31am

Hey @Zorlack,

You could maybe get rid of the custom Docker image by:

adding an init container that does the mount
using your preStop hook to do the umount

This way you don't have to deal with building your own image and keeping it up-to-date.

Why would a shutdown timeout of a single instance cause the entire cluster to flap?

This is not expected. I'd like to understand it better.

Can you share your Elasticsearch yaml manifest?
What do you mean with the entire cluster detects "Readiness Probe Failed" and falls over? All Pods become non-ready so the service cannot route to the cluster?
Can you share some logs of the operator and Elasticsearch while this happens?

Zorlack · October 31, 2019, 5:11pm

Hello @sebgl

I've been able to recreate this issue using a custom image, but the problem goes away when I use lifecycle exec commands.

To demonstrate the behavior I've made a short video which starts when I apply a change from 7.4.0 to 7.4.1: https://youtu.be/4icmwoyN8uY

(I have operator log files if you're interested in chasing this behavior down - but I think it comes down to my image not exiting cleanly.)

I have eliminated this behavior by moving my logic to lifecycle exec commands like so:

cat <<EOF | kubectl -n test apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1beta1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 7.4.0
  nodeSets:
  - name: default
    count: 3
    config:
      node.master: true
      node.data: true
      node.ingest: true
      path.repo: [ "/var/local" ]
      xpack.security.authc.realms:
        native:
          native1:
            order: 1
    podTemplate:
      spec:
        containers:
          - name: elasticsearch
            resources:
              limits:
                memory: 2G
                cpu: 2
            env:
            - name: ES_JAVA_OPTS
              value: "-Xms1g -Xmx1g"
            securityContext:
              capabilities:
                add:
                  - SYS_ADMIN
            # Important: You must mount to a path which already exists in the image, because postStart executes too late to create the mount point.  
            # I used /var/local because it was empty and seemed reasonable.
            lifecycle:
              postStart:
                exec:                      
                  command:
                    - "sh"
                    - "-c"
                    - >
                      yum -y install nfs-utils &&
                      mount -vvv -t nfs -o nolock nfs-server:/volume1/search-quickstart /var/local
              preStop:
                exec:
                  command: ["/usr/bin/umount", "/var/local" ]
    volumeClaimTemplates:
      - metadata:
          name: elasticsearch-data
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
          storageClassName: local-path
EOF

The one problem with this, as a solution, is that now I have to install a bunch of packages everytime I initialize a pod. But it's a cleaner solution than a custom image overall.

Many thanks!

Topic		Replies	Views
Elasticsearch Resource creation failing with ECK operator Elastic Cloud on Kubernetes (ECK)	1	868	April 14, 2021
Readiness probe failed: {"timestamp": "2023-05-12T08:12:09+00:00", "message": "readiness probe failed", "curl_rc": "7"} Elastic Cloud on Kubernetes (ECK) docker	1	1348	June 9, 2023
Default Elasticsearch ECK Installation stuck on "readiness probe failed" Elastic Cloud on Kubernetes (ECK) docker , painless	2	1568	March 15, 2023
Cluster never recovers on baremetal cloud-on-k8s instance Elastic Cloud on Kubernetes (ECK)	5	1727	November 4, 2022
"Resource was created with older version of operator, will not take action" cause ES cluster pods to stuck Elastic Cloud on Kubernetes (ECK)	3	1035	November 4, 2022

Upgrade of custom-image statefulset causes cluster-wide Readiness probe failed

Related topics