Need some help understanding filebeat init scripts

Hi all. I have been tasked to manage a filebeat installation on a not-so-recent machine - we're talking about a CentOS 6.4 with a filebeat version 7.12, installed by rpm. This is not (yet) a systemd-enabled distro so filebeat service should be controlled by "service filebeat start/stop" syntax (or with the equivalent /etc/rc.d/init.d/filebeat stop/start). I've noticed that the "start" section of the init script launches 2 distinct processes, one named "filebeat-god" and a second one that is the "normal" filebeat process. The "stop" part of the script, when invoked, kills the filebeat-god process but leaves the other one running. I haven't found a way to do a clean restart without manually killing the second process. I found this behavior very strange and it looks unlikely to me that this is intended. Can anyone help me understanding what's wrong with this installation?
Thanks in advance, bye!

Hi @Hermooz,

Which problems exactly are you facing? Have you tried to run Filebeat and star/stop/restart it? Or did you just read the init script and are trying to understand how it works?

I did a quick test on CentOS6 and did not find any issues starting/stopping/restarting Filebeat installed from the rpm package.

This filebeat-god process you're seeing is an instance of go-daemon that is used to make Filebeat behave like a daemon, once it is stopped it also stops Filebeat.

Also, I'd suggest updating Filebeat to v7.17 which is compatible with CentOS6, and is still receiving bug fixes.

Sorry for not make myself clear: yes, I did try to stop and restart filebeat and the problem is that in the stop phase only the "daemon" process is stopped. The normal filebeat process stays alive, and prevents further starts until I kill it manually. If filebeat-god is supposed to stop filebeat, then for some reason it's unable to do it. I will surely update to v7.17 as you suggest and see it this fix this situation. Thanks!

I've updated the filebeat package to version 7.17 but it still behaves the same way: apparently "service filebeat stop" is unable to stop cleanly the service. Any further hint to troubleshoot this issue?

Thanks, bye
Fabrizio

How long are you waiting for the process to stop? Filebeat's shutdown process might take a while if it still has got to send some data to Elasticsearch. It can take several minutes depending on your throughput/load.

Have you tried to start/stop a Filebeat running an empty configuration? If not, please try it to see how long it takes, or if it works.

If you look at the logs, you can see the beginning of the shutdown process, as well as the end.

When Filebeat starts its shutdown process it logs:

{"log.level":"info","@timestamp":"2022-06-15T10:04:21.507Z","log.origin":{"file.name":"beater/filebeat.go","file.line":425},"message":"Stopping filebeat","service.name":"filebeat","ecs.version":"1.6.0"}

Then there are several log lines corresponding to the inputs/modules running, registry flushing, etc.

Then you get one last log line stating the shutdown process has finished:

{"log.level":"info","@timestamp":"2022-06-15T10:04:21.512Z","log.origin":{"file.name":"instance/beat.go","file.line":467},"message":"filebeat stopped.","service.name":"filebeat","ecs.version":"1.6.0"}

Here is the example if the whole shutdown process of a Filebeat running an empty config (no input/modules configured).

{"log.level":"info","@timestamp":"2022-06-15T10:04:21.507Z","log.origin":{"file.name":"beater/filebeat.go","file.line":425},"message":"Stopping filebeat","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-06-15T10:04:21.507Z","log.origin":{"file.name":"beater/crawler.go","file.line":155},"message":"Stopping Crawler","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-06-15T10:04:21.507Z","log.origin":{"file.name":"beater/crawler.go","file.line":165},"message":"Stopping 0 inputs","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-06-15T10:04:21.507Z","log.origin":{"file.name":"cfgfile/reload.go","file.line":227},"message":"Dynamic config reloader stopped","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-06-15T10:04:21.507Z","log.origin":{"file.name":"beater/crawler.go","file.line":185},"message":"Crawler stopped","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-06-15T10:04:21.507Z","log.logger":"registrar","log.origin":{"file.name":"registrar/registrar.go","file.line":132},"message":"Stopping Registrar","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-06-15T10:04:21.507Z","log.logger":"registrar","log.origin":{"file.name":"registrar/registrar.go","file.line":166},"message":"Ending Registrar","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-06-15T10:04:21.507Z","log.logger":"registrar","log.origin":{"file.name":"registrar/registrar.go","file.line":137},"message":"Registrar stopped","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-06-15T10:04:21.511Z","log.logger":"monitoring","log.origin":{"file.name":"log/log.go","file.line":192},"message":"Total metrics","service.name":"filebeat","monitoring":{"metrics":{"beat":{"cpu":{"system":{"ticks":50,"time":{"ms":50}},"total":{"ticks":120,"time":{"ms":120},"value":0},"user":{"ticks":70,"time":{"ms":70}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":8},"info":{"ephemeral_id":"f043d0da-22fe-4f0c-9860-fd6dc8e4d23c","uptime":{"ms":88687},"version":"8.2.3"},"memstats":{"gc_next":31405272,"memory_alloc":16507496,"memory_sys":33375240,"memory_total":51633192,"rss":92033024},"runtime":{"goroutines":15}},"filebeat":{"events":{"active":0,"added":0,"done":0},"harvester":{"closed":0,"open_files":0,"running":0,"skipped":0,"started":0},"input":{"log":{"files":{"renamed":0,"truncated":0}},"netflow":{"flows":0,"packets":{"dropped":0,"received":0}}}},"libbeat":{"config":{"module":{"running":0,"starts":0,"stops":0},"reloads":1,"scans":1},"output":{"events":{"acked":0,"active":0,"batches":0,"dropped":0,"duplicates":0,"failed":0,"toomany":0,"total":0},"read":{"bytes":0,"errors":0},"type":"elasticsearch","write":{"bytes":0,"errors":0}},"pipeline":{"clients":0,"events":{"active":0,"dropped":0,"failed":0,"filtered":0,"published":0,"retry":0,"total":0},"queue":{"acked":0,"max_events":4096}}},"registrar":{"states":{"cleanup":0,"current":0,"update":0},"writes":{"fail":0,"success":0,"total":0}},"system":{"cpu":{"cores":4},"load":{"1":0.1,"15":0.01,"5":0.04,"norm":{"1":0.025,"15":0.0025,"5":0.01}}}},"ecs.version":"1.6.0"}}
{"log.level":"info","@timestamp":"2022-06-15T10:04:21.511Z","log.logger":"monitoring","log.origin":{"file.name":"log/log.go","file.line":193},"message":"Uptime: 1m28.687356475s","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-06-15T10:04:21.511Z","log.logger":"monitoring","log.origin":{"file.name":"log/log.go","file.line":160},"message":"Stopping metrics logging.","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-06-15T10:04:21.512Z","log.origin":{"file.name":"instance/beat.go","file.line":467},"message":"filebeat stopped.","service.name":"filebeat","ecs.version":"1.6.0"}

All those logs I got from testing the latest RPM release, v8.2.3 from our website in a CentOS 6.10 VM, where it all worked fine.

Hi Tiago and thanks again for your help. By deepening the investigation we have been able to pinpoint the problem to a performance one, tied to the numerosity of the file harvested. By running filebeat with a lighter load we've confirmed that the issue do not occur. As of now we are trying to tune filebeat to work only with a subset of files at a time (by using close_eof, ignore_older, etc.) and this seems promising.

Many regards!

1 Like