Filebeat CrashLoopBackOff

amir_Bialek · December 8, 2022, 9:20pm

After installing filebeat from the new Helm 8.5 in our cluster k8s, 5 of the pods are stuck in CrashLoopBackOff:

{"log.level":"error","@timestamp":"2022-12-08T21:13:19.258Z","log.origin":{"file.name":"instance/beat.go","file.line":1057},"message":"Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 10","service.name":"filebeat","ecs.version":"1.6.0"}
Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 10

Any idea what can cause this issue?

When I ran the same helms locally with minikube everything was fine.

Martin_Schimandl · December 12, 2022, 12:22pm

Hi!
same problem here as well.
We are running K8S Version 1.23
We have installed filebeat via DaemonSet as documented here: Run Filebeat on Kubernetes | Filebeat Reference [8.5] | Elastic

amir_Bialek · December 13, 2022, 12:51pm

@Martin_Schimandl Did you had a previews filebeat instance installed on the cluster?
And are you sending it directly to elasticsearch or via logstash?

When I am using helm 7.5 to send to logstash I have no problem, it only happened if I am sending directly to ES.
Also interesting, if I uninstall filebeat (helm), wait 2 days and re-install it, it works ok.. but this is not fully tested.

aaszxc · December 13, 2022, 2:19pm

I had a similar problem. Delete the file filebeat.lock

Node > /var/lib/filebeat-{..}/filebeat.lock

Martin_Schimandl · December 13, 2022, 9:05pm

We previously used fluentd and sent the data directly to elasticsearch.
We switched to filebeat in order to make it easier to send the data to logstash, like we do for our non Kubernetes infrastructure.

I am quite sure this issue is not related to the destination of the logs.
My guess is this is a race condition with the kubelet, some kernel API or something like that, since the process id in the error message is always pretty low.

My current workaround to reduce the change of this issue was:
Switch from container image docker.elastic.co/beats/filebeat:8.5.3 to docker.elastic.co/beats/filebeat:8.4.3
because it looks like this made the error appear less often

Martin_Schimandl · December 13, 2022, 9:05pm

@aaszxc good idea!
Will try that when we get the crash loop again

amir_Bialek · December 17, 2022, 8:07pm

This does not work. the file appear again after a few minutes and the pod crash again

aaszxc · December 20, 2022, 5:14pm

Try to analyze resource consumption or increase resources

BenB196 · January 6, 2023, 7:32pm

If you're running in a container, a solution I've found is running the following command in the container:

find /usr/share/elastic-agent/. -type f -name "*beat*lock" -exec rm {} \;

This will clean up all of the lock files related to beats running under the Elastic Agent. You will need to wait a few minutes for the agent to return to a healthy state. I've noticed that running the command the first time doesn't always work, so if you give it a few minutes and its still having issues, you can rerun the command and it should hopefully get it there eventually.

This appears to be a bug which is being worked on (Refactor beats lockfile to use timeout, retry by fearful-symmetry · Pull Request #34194 · elastic/beats · GitHub)

system · February 3, 2023, 9:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data path already locked by another beat ,how to configure path.data in running on kubernetes filebeat-kubernetes.yaml Beats filebeat	7	2483	February 4, 2021
Exiting: data path already locked by another beat Beats filebeat	4	5697	April 3, 2020
Filebeat 7.4.1 PODs of k8s daemonset constantly crash Beats docker , filebeat	7	1416	December 31, 2019
Beats config locks the log file which I cant edit in MacOS,pls help resolve Beats	3	617	September 24, 2016
Kubernetes - Filebeat stops sending/picking up logs Beats filebeat	17	6591	June 26, 2018

Filebeat CrashLoopBackOff

Related topics