Issue: Elasticsearch pods are running into crashloopbackoff because of inject-process-manager container

What did you do?
I just freshly installed an Elastic stack on Kubernetes in following tutorial:
https://www.elastic.co/guide/en/cloud-on-k8s/current/index.html

So I have an elasticsearch cluster of three nodes (test phase;)) + an instance of Kibana that connects to it.

Everything seems to work because everything is in a "green" state:

[root@chnkubmtr36 es-operator]# kubectl get elasticsearch
NAME         HEALTH   NODES   VERSION   PHASE         AGE
quickstart   green    3       7.1.0     Operational   26m

But all the elasticsearch-pods are in the crashloopbackoff state.

[root@chnkubmtr36 es-operator]# kubectl get po -o wide
NAME                       READY   STATUS                  RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
quickstart-es-55mbfz28n8   0/1     Init:CrashLoopBackOff   2          28m   10.233.66.163   chnkubnode38   <none>           <none>
quickstart-es-d789sfck4d   0/1     Init:CrashLoopBackOff   2          28m   10.233.65.56    chnkubnode37   <none>           <none>
quickstart-es-g8dx797tw5   0/1     Init:CrashLoopBackOff   1          28m   10.233.64.129   chnkubmtr36    <none>           <none>

What did you expect to see?
All the pods in running state without any errors.

What did you see instead? Under which circumstances?

Elasticsearch pods in crashloopbackoff and when described the elasticsearch pods, we found out that inject-process-manager container is going to crashloopbackoff. and we are not able to view its logs as well.
So what are functionalities of inject-process-manager container? and on what all conditions does it fail?

Environment

  1. Vmware Vms
  2. CentOS Linux release 7.6.1810 (Core)
  3. Kubernetes - 1.13.5 (on premise setup)
  4. Docker - 18.09.1
$ kubectl version - 1.13.5

Hey Batchu,

Are there any restrictions on running privileged containers in the environment or modifying host level settings? The privileged container sets the VMMaxMap setting for the host, and this setting may be failing causing the CrashLoopBackOff

When you say cannot view the logs, does kubectl describe es quickstart return anything?

Hi Anurag,

There are no restrictions on running privileged containers in the environment or modifying host level settings.

Below is the log from the inject-process-manager container which is running into that crashloopbackoff:

[root@chnkubnode37 ~]# kubectl logs -f quickstart-es-42mjswwgjj -c inject-process-manager
cp: cannot create regular file '/volume/bin/process-manager': Text file busy

Here is the output of description of es quickstart pod:

Name: quickstart-es-g8dx797tw5
Namespace: default
Priority: 0
PriorityClassName:
Node: chnkubmtr36/100.121.41.136
Start Time: Wed, 12 Jun 2019 14:56:04 +0530
Labels: app=es
common.k8s.elastic.co/type=elasticsearch
elasticsearch.k8s.elastic.co/cluster-name=quickstart
elasticsearch.k8s.elastic.co/node-data=true
elasticsearch.k8s.elastic.co/node-ingest=true
elasticsearch.k8s.elastic.co/node-master=true
elasticsearch.k8s.elastic.co/node-ml=true
elasticsearch.k8s.elastic.co/version=7.1.0
Annotations: update.k8s.elastic.co/timestamp: 2019-06-13T05:58:56.065965419Z
Status: Running
IP: 10.233.64.203
Controlled By: Elasticsearch/quickstart
Init Containers:
tweak-os-settings:
Container ID: docker://484e0b4f1891be33707c38a83f0cf0925b30157502031fb91b4fede9e5c90e31
Image: docker.elastic.co/elasticsearch/elasticsearch:7.1.0
Image ID: docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:802b6a299260dbaf21a9c57e3a634491ff788a1ea13a51598d4cd105739509c4
Port:
Host Port:
Command:
sysctl
-w
vm.max_map_count=262144
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 13 Jun 2019 11:27:21 +0530
Finished: Thu, 13 Jun 2019 11:27:21 +0530
Ready: True
Restart Count: 2
Environment:
Mounts:
prepare-fs:

Container ID: docker://60cabece669a3526c9d9adee562089da933cd3d5acb660cea03a9107e955e4aa
Image: docker.elastic.co/elasticsearch/elasticsearch:7.1.0
Image ID: docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:802b6a299260dbaf21a9c57e3a634491ff788a1ea13a51598d4cd105739509c4
Port:
Host Port:
Command:
bash
-c
#!/usr/bin/env bash -eu

    ES_DIR="/usr/share/elasticsearch"
    CONFIG_DIR=$ES_DIR/config
    PLUGIN_BIN=$ES_DIR/bin/elasticsearch-plugin
    KEYSTORE_BIN=$ES_DIR/bin/elasticsearch-keystore

    # compute time in seconds since the given start time
    function duration() {
      local start=$1
      end=$(date +%s)
      echo $((end-start))
    }

    ######################
    #        START       #
    ######################

    script_start=$(date +%s)

    echo "Starting init script"

    ######################
    #       Plugins      #
    ######################

    plugins_start=$(date +%s)
    # Install extra plugins

      echo "Installing plugin repository-s3"
      # Using --batch accepts any user prompt (y/n)
      $PLUGIN_BIN install --batch repository-s3

      echo "Installing plugin repository-gcs"
      # Using --batch accepts any user prompt (y/n)
      $PLUGIN_BIN install --batch repository-gcs


    echo "Installed plugins:"
    $PLUGIN_BIN list

    echo "Plugins installation duration: $(duration $plugins_start) sec."

    ######################
    #  Config linking    #
    ######################

    # Link individual files from their mount location into the config dir
    # to a volume, to be used by the ES container
    ln_start=$(date +%s)

      echo "Linking /mnt/elastic/secrets/users to /usr/share/elasticsearch/config/users"
      ln -sf /mnt/elastic/secrets/users /usr/share/elasticsearch/config/users

      echo "Linking /mnt/elastic/secrets/roles.yml to /usr/share/elasticsearch/config/roles.yml"
      ln -sf /mnt/elastic/secrets/roles.yml /usr/share/elasticsearch/config/roles.yml

      echo "Linking /mnt/elastic/secrets/users_roles to /usr/share/elasticsearch/config/users_roles"
      ln -sf /mnt/elastic/secrets/users_roles /usr/share/elasticsearch/config/users_roles

      echo "Linking /mnt/elastic/es-config/elasticsearch.yml to /usr/share/elasticsearch/config/elasticsearch.yml"
      ln -sf /mnt/elastic/es-config/elasticsearch.yml /usr/share/elasticsearch/config/elasticsearch.yml

      echo "Linking /mnt/elastic/unicast-hosts/unicast_hosts.txt to /usr/share/elasticsearch/config/unicast_hosts.txt"
      ln -sf /mnt/elastic/unicast-hosts/unicast_hosts.txt /usr/share/elasticsearch/config/unicast_hosts.txt

    echo "File linking duration: $(duration $ln_start) sec."


    ######################
    #  Files persistence #
    ######################

    # Persist the content of bin/, config/ and plugins/
    # to a volume, to be used by the ES container
    mv_start=$(date +%s)

      echo "Moving /usr/share/elasticsearch/config/* to /volume/config/"
      mv /usr/share/elasticsearch/config/* /volume/config/

      echo "Moving /usr/share/elasticsearch/plugins/* to /volume/plugins/"
      mv /usr/share/elasticsearch/plugins/* /volume/plugins/

      echo "Moving /usr/share/elasticsearch/bin/* to /volume/bin/"
      mv /usr/share/elasticsearch/bin/* /volume/bin/

      echo "Moving /usr/share/elasticsearch/data/* to /volume/data/"
      mv /usr/share/elasticsearch/data/* /volume/data/

      echo "Moving /usr/share/elasticsearch/logs/* to /volume/logs/"
      mv /usr/share/elasticsearch/logs/* /volume/logs/

    echo "Files copy duration: $(duration $mv_start) sec."

    ######################
    #  Volumes chown     #
    ######################

    # chown the data and logs volume to the elasticsearch user
    chown_start=$(date +%s)

      echo "chowning /volume/data to elasticsearch:elasticsearch"
      chown -v elasticsearch:elasticsearch /volume/data

      echo "chowning /volume/logs to elasticsearch:elasticsearch"
      chown -v elasticsearch:elasticsearch /volume/logs

    echo "chown duration: $(duration $chown_start) sec."

    ######################
    #         End        #
    ######################

    echo "Init script successful"
    echo "Script duration: $(duration $script_start) sec."

State:          Terminated
  Reason:       Completed
  Exit Code:    0
  Started:      Thu, 13 Jun 2019 11:27:22 +0530
  Finished:     Thu, 13 Jun 2019 11:27:42 +0530
Ready:          True
Restart Count:  0
Environment:    <none>
Mounts:
  /volume/bin from bin-volume (rw)
  /volume/config from config-volume (rw)
  /volume/data from data (rw)
  /volume/logs from logs (rw)
  /volume/plugins from plugins-volume (rw)

inject-process-manager:
Container ID: docker://ca5d27a1c42db05dd375c99d8b6a7a94cbc2ed51ca79af9f077a565fe7df2927
Image: docker.elastic.co/eck/eck-operator:0.8.0
Image ID: docker-pullable://docker.elastic.co/eck/eck-operator@sha256:1e910d2502690f9007d103f89cef92ef3c4f1115c08819edb3b7409481b291a3
Port:
Host Port:
Command:
bash
-c

      #!/usr/bin/env bash -eu
      cp process-manager $LOCAL_BIN

State:          Terminated
  Reason:       Completed
  Exit Code:    0
  Started:      Thu, 13 Jun 2019 11:27:44 +0530
  Finished:     Thu, 13 Jun 2019 11:27:44 +0530
Ready:          True
Restart Count:  0
Environment:
  LOCAL_BIN:  /volume/bin
Mounts:
  /volume/bin from local-bin-volume (rw)

cert-initializer:
Container ID: docker://b8fc1b59fe77c20a7cb71288562e71bcdc772ac0a514ebc554292a3123b90a5c
Image: docker.elastic.co/eck/eck-operator:0.8.0
Image ID: docker-pullable://docker.elastic.co/eck/eck-operator@sha256:1e910d2502690f9007d103f89cef92ef3c4f1115c08819edb3b7409481b291a3
Port: 8001/TCP
Host Port: 0/TCP
Command:
/root/cert-initializer
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 13 Jun 2019 11:27:49 +0530
Finished: Thu, 13 Jun 2019 11:27:49 +0530
Ready: True
Restart Count: 0
Environment:
Mounts:
/mnt/elastic/private-key from private-key-volume (rw)
/usr/share/elasticsearch/config/node-certs from node-certificates (ro)

Containers:
elasticsearch:
Container ID: docker://8e7e6786c32403e093f9c2b9ee8a08fd33b2024097d9466f44cba0102072192a
Image: docker.elastic.co/elasticsearch/elasticsearch:7.1.0
Image ID: docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:802b6a299260dbaf21a9c57e3a634491ff788a1ea13a51598d4cd105739509c4
Ports: 9200/TCP, 9300/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
/mnt/elastic/process-manager/process-manager
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 255
Started: Thu, 13 Jun 2019 11:27:17 +0530
Finished: Thu, 13 Jun 2019 11:27:18 +0530
Ready: False
Restart Count: 2
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 2
memory: 4Gi
Readiness: exec [bash -c
#!/usr/bin/env bash

Consider a node to be healthy if it responds to a simple GET on "/"

CURL_TIMEOUT=3

setup basic auth if credentials are available

if [ -n "{PROBE_USERNAME}" ] && [ -f "{PROBE_PASSWORD_FILE}" ]; then
PROBE_PASSWORD=$(<PROBE_PASSWORD_FILE) BASIC_AUTH="-u {PROBE_USERNAME}:${PROBE_PASSWORD}"
else
BASIC_AUTH=''
fi

request Elasticsearch

status=$(curl -o /dev/null -w "%{http_code}" --max-time CURL_TIMEOUT -XGET -s -k {BASIC_AUTH} ${READINESS_PROBE_PROTOCOL:-https}://127.0.0.1:9200)

ready if status code 200

if [[ $status == "200" ]]; then
exit 0
else
exit 1
fi
] delay=10s timeout=5s period=10s #success=3 #failure=3
Environment:
POD_NAME: quickstart-es-g8dx797tw5 (v1:metadata.name)
POD_IP: (v1:status.podIP)
ES_JAVA_OPTS: -Xms2048M -Xmx2048M -Djava.security.properties=/usr/share/elasticsearch/config/managed/security.properties
READINESS_PROBE_PROTOCOL: https
PROBE_USERNAME: elastic-internal-probe
PROBE_PASSWORD_FILE: /mnt/elastic/probe-user/elastic-internal-probe
PM_PROC_NAME: es
PM_PROC_CMD: /usr/local/bin/docker-entrypoint.sh
PM_TLS: true
PM_CERT_PATH: /usr/share/elasticsearch/config/node-certs/cert.pem
PM_KEY_PATH: /usr/share/elasticsearch/config/private-key/node.key
KEYSTORE_SOURCE_DIR: /mnt/elastic/secure-settings
KEYSTORE_RELOAD_CREDENTIALS: true
KEYSTORE_ES_USERNAME: elastic-internal-reload-creds
KEYSTORE_ES_PASSWORD_FILE: /mnt/elastic/reload-creds-user/elastic-internal-reload-creds
KEYSTORE_ES_CA_CERTS_PATH: /usr/share/elasticsearch/config/node-certs/ca.pem
KEYSTORE_ES_ENDPOINT: https://127.0.0.1:9200
KEYSTORE_ES_VERSION: 7.1.0
Mounts:
/mnt/elastic/es-config from es-config (ro)
/mnt/elastic/probe-user from probe-user (ro)
/mnt/elastic/process-manager from local-bin-volume (rw)
/mnt/elastic/reload-creds-user from reload-creds-user (ro)
/mnt/elastic/secrets from users (ro)
/mnt/elastic/secure-settings from secure-settings (ro)
/mnt/elastic/unicast-hosts from quickstart-unicast-hosts (ro)
/usr/share/elasticsearch/bin from bin-volume (rw)
/usr/share/elasticsearch/config from config-volume (rw)
/usr/share/elasticsearch/config/extrafiles from extrafiles (ro)
/usr/share/elasticsearch/config/managed from quickstart (ro)
/usr/share/elasticsearch/config/node-certs from node-certificates (ro)
/usr/share/elasticsearch/config/private-key from private-key-volume (rw)
/usr/share/elasticsearch/data from data (rw)
/usr/share/elasticsearch/logs from logs (rw)
/usr/share/elasticsearch/plugins from plugins-volume (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
plugins-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
bin-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
logs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
private-key-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
local-bin-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
users:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-roles-users
Optional: false
quickstart:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: quickstart
Optional: false
quickstart-unicast-hosts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: quickstart-unicast-hosts
Optional: false
probe-user:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-internal-users
Optional: false
extrafiles:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-extrafiles
Optional: false
reload-creds-user:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-internal-users
Optional: false
secure-settings:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-secure-settings
Optional: false
node-certificates:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-g8dx797tw5-certs
Optional: false
es-config:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-g8dx797tw5-config
Optional: false
QoS Class: Guaranteed
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Normal Pulled 37m (x717 over 20h) kubelet, chnkubmtr36 Successfully pulled image "docker.elastic.co/eck/eck-operator:0.8.0"
Warning BackOff 17m (x3203 over 18h) kubelet, chnkubmtr36 Back-off restarting failed container
Normal Pulled 2m4s (x784 over 20h) kubelet, chnkubmtr36 Container image "docker.elastic.co/eck/eck-operator:0.8.0" already present on machine

Hi Anurag,

Couldn't post the entire log in a single message because of character limitations per message. So I have split it as 3 messages and posted it in sequence.

Please let me know if you can figure out what might the reason for this issue.

Thanks in Advance!