After installing filebeat from the new Helm 8.5 in our cluster k8s, 5 of the pods are stuck in CrashLoopBackOff:
{"log.level":"error","@timestamp":"2022-12-08T21:13:19.258Z","log.origin":{"file.name":"instance/beat.go","file.line":1057},"message":"Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 10","service.name":"filebeat","ecs.version":"1.6.0"}
Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 10
Any idea what can cause this issue?
When I ran the same helms locally with minikube everything was fine.
@Martin_Schimandl Did you had a previews filebeat instance installed on the cluster?
And are you sending it directly to elasticsearch or via logstash?
When I am using helm 7.5 to send to logstash I have no problem, it only happened if I am sending directly to ES.
Also interesting, if I uninstall filebeat (helm), wait 2 days and re-install it, it works ok.. but this is not fully tested.
We previously used fluentd and sent the data directly to elasticsearch.
We switched to filebeat in order to make it easier to send the data to logstash, like we do for our non Kubernetes infrastructure.
I am quite sure this issue is not related to the destination of the logs.
My guess is this is a race condition with the kubelet, some kernel API or something like that, since the process id in the error message is always pretty low.
If you're running in a container, a solution I've found is running the following command in the container:
find /usr/share/elastic-agent/. -type f -name "*beat*lock" -exec rm {} \;
This will clean up all of the lock files related to beats running under the Elastic Agent. You will need to wait a few minutes for the agent to return to a healthy state. I've noticed that running the command the first time doesn't always work, so if you give it a few minutes and its still having issues, you can rerun the command and it should hopefully get it there eventually.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.