I have many servers using filebeat 1.2.3 on windows. It's monitoring about 16 directories with rotating logs in each. Some logs can be rotated within a few seconds others can take longer. The issue I face is that the filebeat process starts to consume 100% cpu but doesn't actually start shipping any new files.
close_older: 30m
ignore_older: 1h
Running with -v -e -d "*" it appears to be stuck on the prospector checking files over and over and not actually shipping anything.
We are currently introducing some new options to clean up the registry file so it doesn't grow over time. I would assume clean_older could help in your case: https://github.com/elastic/beats/issues/1600#issuecomment-228670068 Any chance for you to try out the nightly builds and see if it helps?
The registry file seems a lot better, I've however noticed that filebeat process just dies after a day.
This is logged in the Windows Event Viewer
The filebeat service terminated unexpectedly. It has done this 2 time(s).
The log file ends with:
2016-07-07T13:00:42-04:00 INFO Run prospector
2016-07-07T13:00:46-04:00 INFO Events sent: 40
2016-07-07T13:00:46-04:00 INFO Registry file updated. 892 states written.
I currently have filebeat sending to both logstash and kafka. I'll remove the kafka output and see if the problem persists.
From my last log. I'll see if Debug shows more info.
2016-07-11T13:51:32-04:00 INFO Read line error: No more bytes
2016-07-11T13:51:32-04:00 INFO Crawler stopped
2016-07-11T13:51:32-04:00 INFO Stopping spooler
2016-07-11T13:51:32-04:00 INFO Stopping Registrar
2016-07-11T13:51:32-04:00 INFO Ending Registrar
2016-07-11T13:51:32-04:00 INFO Registry file updated. 4410 states written.
2016-07-11T13:51:32-04:00 INFO filebeat cleanup
@Zfs The above log message is somehow strange is it seems to show a normal filebeat shut down. Could it be that you have any security tool on your machine that kills long running processes? Is there some pattern on when this happens?
Unfortunately no, I had filebeat 1.2 running for much longer on this same server without a restart. I still have 1.2 running on about 50 servers without any issue other than having to cleanup the registry manually when it becomes too big.
I'll continue to change my config and see if any thing changes.
As a workaround I could have the service restart itself on all failures but I'll hold off on that for now.
Not much info from debug logs it just ends, now I'm wondering if I actually uploaded the correct log yesterday. I'll test a few other options.
2016-07-12T02:19:16-04:00 DBG Prospector states cleaned up.
2016-07-12T02:19:16-04:00 DBG End of file reached: C:\Progra~1\xxxxx-3.07.12.2016-02.19.09.log; Backoff now.
2016-07-12T02:19:16-04:00 DBG End of file reached: C:\Progra~1\xxxxx-22.07.12.2016-02.18.47.log; Backoff now.
2016-07-12T02:19:16-04:00 DBG try read: 6
2016-07-12T02:19:16-04:00 DBG End of file reached: C:\Progra~1\xxxx-10.07.12.2016-01.26.38.log; Backoff now.
2016-07-12T02:19:16-04:00 DBG End of file reached: C:\Progra~1\xxxx-15.07.12.2016-01.35.02.log; Backoff now.
2016-07-12T02:19:16-04:00 DBG End of file reached: C:\Progra~1\xxxx-18.07.12.2016-02.19.07.log; Backoff now.
2016-07-12T02:19:16-04:00 DBG Drop line as it does match one of the exclude patterns07/12/16 02:19:05.941 INFO: [xxxxx]
2016-07-12T02:19:16-04:00 DBG End of file reached: C:\Progra~1\xxxx-9.07.12.2016-02.18.22.log; Backoff now.
2016-07-12T02:19:17-04:00 DBG End of file reached: C:\Progra~1\xxxx-3.07.12.2016-02.19.09.log; Backoff now.
2016-07-12T02:19:17-04:00 DBG End of file reached: C:\Progra~1\xxxx-20.07.12.2016-01.31.38.log; Backoff now.
2016-07-12T02:19:17-04:00 DBG Drop line as it does match one of the exclude patterns07/12/16 02:19:11.764 INFO: [xxxx]
2016-07-12T02:19:17-04:00 DBG End of file reached: C:\Progra~1\xxx-17.07.12.2016-02.19.02.log; Backoff now.
2016-07-12T02:19:17-04:00 DBG try read: 6
2016-07-12T02:19:17-04:00 DBG End of file reached: C:\Progra~1\x-27.07.12.2016-01.31.19.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG End of file reached: C:\Progra~1\xxx-12.07.12.2016-01.50.46.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG End of file reached: C:\Progra~1\x-22.07.12.2016-02.18.47.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG End of file reached: C:\Progra~1\x-6.07.12.2016-01.19.30.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG End of file reached: C:\Progra~1\x-5.07.12.2016-02.18.02.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG End of file reached: C:\Progra~1\x-16.07.12.2016-01.31.02.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG End of file reached: C:\Progra~1\x-18.07.12.2016-02.19.07.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG End of file reached: C:\Progra~1\x-24.07.12.2016-01.35.14.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG End of file reached: C:\Progra~1\x-17.07.12.2016-02.19.02.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG End of file reached: C:\Progra~1\x-19.07.11.2016-16.13.45.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG try read: 6
2016-07-12T02:19:18-04:00 DBG End of file reached: C:\Progra~1\x-3.07.12.2016-01.39.39.log; Backoff now.
2016-07-12T02:19:18-04:00 DBG Flushing spooler because of timeout. Events flushed: 658
2016-07-12T02:19:19-04:00 DBG End of file reached: C:\Progra~1\x-3.07.12.2016-02.19.09.log; Backoff now.
2016-07-12T02:19:19-04:00 DBG End of file reached: C:\Progra~1\x-11.07.12.2016-01.27.05.log; Backoff now.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.