Init container fails

I followed the quickstart guide version 0.9.
I applied the elastic operator CRD.
I deployed the elastic cluster with one node, but its status is Init:CrashLoopBackOff.
After some debugging, I found that the init container elastic-internal-init-filesystem fails, specifically in the script prepare-fs.sh:
"chowning /usr/share/elasticsearch/data to elasticsearch:elasticsearch
chown: changing ownership of '/usr/share/elasticsearch/data': Operation not permitted"

The chown command is being run by root. Am I missing anything?

Ok, so I found out that the issue is happening because I have a default NFS storage class,
so I created a hostPath pv instead. Now I see that the new pv is indeed bounded, but another default nfs storage class is created, and the init container elastic-internal-init-filesystem still uses it, so im having the same issue.
Is this a bug?

Could you post the Elasticsearch resource you are using now here?

Are you specifying a custom persistent volume claim template as described here? https://www.elastic.co/guide/en/cloud-on-k8s/0.9/k8s-volume-claim-templates.html

Also, can you maybe share a bit more information about your Kubernetes environment? Which flavour of Kubernetes are you running and in which version?

It works now.
In the link you sent, it says " The name in the template must be elasticsearch-data"

As I said, I followed the quickstart guide, and there was no mention there about the name restriction :slight_smile:
https://www.elastic.co/guide/en/cloud-on-k8s/0.9/k8s-quickstart.html

We are fixing the quickstart guide to be more explicit https://github.com/elastic/cloud-on-k8s/pull/1489

I have same issue with container in Init:CrashLoopBackOff state. I use local k8s cluster deployed by Racnher and I use https://github.com/rancher/local-path-provisioner for volumes. I tried to configure PVC and emptyDir as described here https://www.elastic.co/guide/en/cloud-on-k8s/1.0-beta/k8s-volume-claim-templates.html and neither worked. How can I troubleshoot it further? I don't see any error massages anywhere in kubectl.

Hi Andrii,

You can retrieve the logs of the init container using this command:

kubectl logs <pod_name> -c <container_name>

Example:

# Retrieve the name of the init container
> k get pod elasticsearch-sample-es-default-0 -o json | jq '.spec.initContainers[].name'
"elastic-internal-init-filesystem"
                                         
# Get the last 2 lines logged by the init container
> k logs elasticsearch-sample-es-default-0 -c elastic-internal-init-filesystem | tail -2
Init script successful
Script duration: 1 sec.

Hi Richard, thank you for quick a response, somehow I didn't get notification about the reply.

I didn't know about -c option before, and when I did that I saw following message, which is weird:

standard_init_linux.go:190: exec user process caused "exec format error"

It appeared for filebeat as well and only on one of the nodes in my k8s cluster. I have no idea what was wrong with that node as all nodes were created by vagrant in hyper-v. I have removed and recreated the entire VM and the issues has gone, but that is still very weird and I am not sure how I would approach it if I get same error in our production cluster.

Hi, just to share my experience, this particular error

standard_init_linux.go:190: exec user process caused "exec format error

also came up for me. The reason was I had edited a file inside Elasticsearch image (which should be having LF line endings) in Windows which caused a few lines to have CRLF line endings. Also it only came up in a few pods, not all (which seems to be due to the imagePullPolicy being set to IfNotPresent). It went away when I re-normalized line endings to LF, and force pulled the fixed image.

I have similar issue. I use https://github.com/kubernetes-incubator/external-storage/tree/master/nfs-client as my storageclass

The speed is not the issue to me because they are all VM

The problem is I have the same permission error.

I actually can see it creates a new folder in my nfs folder
like elk-logging-elasticsearch-data-elk-logging-es-masters-0-pvc-b0e84c3f-7f8d-4891-a966-4a2de08e041a

However, it is empty and I can see the error from the log

chowning /usr/share/elasticsearch/data to elasticsearch:elasticsearch
chown: changing ownership of '/usr/share/elasticsearch/data': Operation not permitted
failed to change ownership of '/usr/share/elasticsearch/data' from 65534:65534 to elasticsearch:elasticsearch

I don't understand why it can't chown the folder.

I spent the whole day trying to find out a solution but fail. Hope you guys can help me.

Btw, if I use local volume type, it works. I don't understand what's the problem.

1 Like

Hi,

I am getting same issue. Could you help me how you resolve that?

Thanks!

Seeing the same thing with the same NFS-Client storage provider set as default.

How did you dig out the logs? I can only get logs from the NFS-Client pod telling me that the pvc was successfully created.

same problem here with nfs storage class

kubectl logs quickstart-es-default-0 -c elastic-internal-init-filesystem

'/usr/share/elasticsearch/bin/x-pack-env' -> '/mnt/elastic-internal/elasticsearch-bin-local/x-pack-env'
'/usr/share/elasticsearch/bin/x-pack-security-env' -> '/mnt/elastic-internal/elasticsearch-bin-local/x-pack-security-env'
'/usr/share/elasticsearch/bin/x-pack-watcher-env' -> '/mnt/elastic-internal/elasticsearch-bin-local/x-pack-watcher-env'
Files copy duration: 0 sec.
chowning /usr/share/elasticsearch/data to elasticsearch:elasticsearch
chown: changing ownership of '/usr/share/elasticsearch/data': Operation not permitted
failed to change ownership of '/usr/share/elasticsearch/data' from 469779:469779 to elasticsearch:elasticsearch

I have the same error as above when using nfs as storageclass.

chowning /usr/share/elasticsearch/data to elasticsearch:elasticsearch
chown: changing ownership of '/usr/share/elasticsearch/data': Operation not permitted
failed to change ownership of '/usr/share/elasticsearch/data' from 65534:65534 to elasticsearch:elasticsearch

Is there any update / progress on this?

1 Like

I think you have an issue on your nfs server.

Check if root_squash is disable on your nfs server configuration.