Registry 60MB and increasing causes high cpu and stops sending events

Hi,

I have many servers using filebeat 1.2.3 on windows. It's monitoring about 16 directories with rotating logs in each. Some logs can be rotated within a few seconds others can take longer. The issue I face is that the filebeat process starts to consume 100% cpu but doesn't actually start shipping any new files.

close_older: 30m
ignore_older: 1h

Running with -v -e -d "*" it appears to be stuck on the prospector checking files over and over and not actually shipping anything.

I have to delete the registry to return everything to normal.

How many files do you have in the directories in total?

Hi,

984 files currently, older files are removed after 48 hours. On a day it can create 800 new log files.

We are currently introducing some new options to clean up the registry file so it doesn't grow over time. I would assume clean_older could help in your case: https://github.com/elastic/beats/issues/1600#issuecomment-228670068 Any chance for you to try out the nightly builds and see if it helps?

Great, I'll test it on a few servers.

Thanks

The registry file seems a lot better, I've however noticed that filebeat process just dies after a day.

This is logged in the Windows Event Viewer

The filebeat service terminated unexpectedly.  It has done this 2 time(s).

The log file ends with:

2016-07-07T13:00:42-04:00 INFO Run prospector
2016-07-07T13:00:46-04:00 INFO Events sent: 40
2016-07-07T13:00:46-04:00 INFO Registry file updated. 892 states written.

I currently have filebeat sending to both logstash and kafka. I'll remove the kafka output and see if the problem persists.

@Zfs That is strange. As there are not "shut down" messages or panics it seems like the filebeat got killed. Thanks for trying without kafka.

@steffens Pinging you in case kafka could be the issue :slight_smile:

yeah, super strange. Can you run filebeat with debug log enabled?

With kafka turned off, the problem reoccurred. I'll enable debugging.

I have the following extra parameters for file cleaning etc:

ignore_older: 1h
close_older: 30m
clean_older: 36h
clean_removed: 24h

From my last log. I'll see if Debug shows more info.

2016-07-11T13:51:32-04:00 INFO Read line error: No more bytes
2016-07-11T13:51:32-04:00 INFO Crawler stopped
2016-07-11T13:51:32-04:00 INFO Stopping spooler
2016-07-11T13:51:32-04:00 INFO Stopping Registrar
2016-07-11T13:51:32-04:00 INFO Ending Registrar
2016-07-11T13:51:32-04:00 INFO Registry file updated. 4410 states written.
2016-07-11T13:51:32-04:00 INFO filebeat cleanup

@Zfs The above log message is somehow strange is it seems to show a normal filebeat shut down. Could it be that you have any security tool on your machine that kills long running processes? Is there some pattern on when this happens?

Unfortunately no, I had filebeat 1.2 running for much longer on this same server without a restart. I still have 1.2 running on about 50 servers without any issue other than having to cleanup the registry manually when it becomes too big.

I'll continue to change my config and see if any thing changes.

As a workaround I could have the service restart itself on all failures but I'll hold off on that for now.

Not much info from debug logs it just ends, now I'm wondering if I actually uploaded the correct log yesterday. I'll test a few other options.

2016-07-12T02:19:16-04:00 DBG  Prospector states cleaned up.
2016-07-12T02:19:16-04:00 DBG  End of file reached: C:\Progra~1\xxxxx-3.07.12.2016-02.19.09.log; Backoff now.
2016-07-12T02:19:16-04:00 DBG  End of file reached: C:\Progra~1\xxxxx-22.07.12.2016-02.18.47.log; Backoff now.
2016-07-12T02:19:16-04:00 DBG  try read: 6
2016-07-12T02:19:16-04:00 DBG  End of file reached: C:\Progra~1\xxxx-10.07.12.2016-01.26.38.log; Backoff now.
2016-07-12T02:19:16-04:00 DBG  End of file reached: C:\Progra~1\xxxx-15.07.12.2016-01.35.02.log; Backoff now.
2016-07-12T02:19:16-04:00 DBG  End of file reached: C:\Progra~1\xxxx-18.07.12.2016-02.19.07.log; Backoff now.
2016-07-12T02:19:16-04:00 DBG  Drop line as it does match one of the exclude patterns07/12/16 02:19:05.941 INFO: [xxxxx] 
2016-07-12T02:19:16-04:00 DBG  End of file reached: C:\Progra~1\xxxx-9.07.12.2016-02.18.22.log; Backoff now.
2016-07-12T02:19:17-04:00 DBG  End of file reached: C:\Progra~1\xxxx-3.07.12.2016-02.19.09.log; Backoff now.
2016-07-12T02:19:17-04:00 DBG  End of file reached: C:\Progra~1\xxxx-20.07.12.2016-01.31.38.log; Backoff now.
2016-07-12T02:19:17-04:00 DBG  Drop line as it does match one of the exclude patterns07/12/16 02:19:11.764 INFO: [xxxx] 
2016-07-12T02:19:17-04:00 DBG  End of file reached: C:\Progra~1\xxx-17.07.12.2016-02.19.02.log; Backoff now.
2016-07-12T02:19:17-04:00 DBG  try read: 6
2016-07-12T02:19:17-04:00 DBG  End of file reached: C:\Progra~1\x-27.07.12.2016-01.31.19.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG  End of file reached: C:\Progra~1\xxx-12.07.12.2016-01.50.46.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG  End of file reached: C:\Progra~1\x-22.07.12.2016-02.18.47.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG  End of file reached: C:\Progra~1\x-6.07.12.2016-01.19.30.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG  End of file reached: C:\Progra~1\x-5.07.12.2016-02.18.02.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG  End of file reached: C:\Progra~1\x-16.07.12.2016-01.31.02.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG  End of file reached: C:\Progra~1\x-18.07.12.2016-02.19.07.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG  End of file reached: C:\Progra~1\x-24.07.12.2016-01.35.14.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG  End of file reached: C:\Progra~1\x-17.07.12.2016-02.19.02.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG  End of file reached: C:\Progra~1\x-19.07.11.2016-16.13.45.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG  try read: 6
2016-07-12T02:19:18-04:00 DBG  End of file reached: C:\Progra~1\x-3.07.12.2016-01.39.39.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG  Flushing spooler because of timeout. Events flushed: 658
2016-07-12T02:19:19-04:00 DBG  End of file reached: C:\Progra~1\x-3.07.12.2016-02.19.09.log; Backoff now.
2016-07-12T02:19:19-04:00 DBG  End of file reached: C:\Progra~1\x-11.07.12.2016-01.27.05.log; Backoff now.

This topic was automatically closed after 21 days. New replies are no longer allowed.