Can FileBeat correctly handle log rotation and network disruption/failure mode?

filebeater · May 27, 2017, 1:13am

Hello, there,
In our log system, our local log files is automatically rotated, for example, every hour or every midnight, the current log file (e.g., myapp.log) will be renamed (maybe be also compressed) to myapp.log.day1.gz, and start a new log file with the same name myapp.log
I am wondering whether Filebeat can handle this log rotation correctly? from reading the documentation, when the existing log file is renamed, the harvester will hold on the file descriptor and can probably continue to process the remaining log lines. However, can it handle the new log lines written to the new created file with the same file name? or the prospector will create a new harvester for this new log file with the same log file name as the original one?

In addition to handling log rotation under normal condition, can it handle log rotation under the failure mode? for example:

Filebeat crashed for some reason. Since the time it crashes the original myapp.log file was already renamed to my app.log.day2.
FileBeat is configured to output to kafka, and kafka somehow became unavailable. The log files are being rotated while kafka became unavailable.

Any pointer is appreciated!
yan

tudor · May 29, 2017, 8:14am

Generally yes, Filebeat is designed to handle these cases correctly. For the log rotation, note that Filebeat doesn't read the .gz files, so it's somehow better to not compress them immediately after rotation if possible. Otherwise, that sounds like standard log rotation, which FB should handle well.

For temporary network or Kafka failures, Filebeat will automatically stop progressing into your log files until the output is available again. This usually means that no log lines will be lost, but if you compress them too quickly (from Filebeat POV, that's like removing them), it could be that FB misses them. Does that make sense?

filebeater · May 30, 2017, 8:26pm

@tudor, Great, Thanks, Tudor, you answered my question on Filebeat can handle temporary network interruption. I am still wondering whether FileBeat can recover from its own crash, i.e.,
What happened if FileBeat is down and restarted after a while?
FileBeat crashed for some reason without our knowledge. We did not have chance to start it until two days later. Specifically consider the following scenario:

Day 1. FileBeat was running and crashed. After the filebeat crashed, the original .log file was renamed to myapp.log.day.1

Day 2. Anoather log file was started and logged to .log and renamed to myapp.log.day.2 at the end of day.

Day 3. a new log file was started.. we found FileBeat crashed somehow, and manually re-started on day3. At this point, will FileBeat miss partial myapp.log.day.1 and entire myapp.log.day.2? if not how does it know which log file(s) to catch up since it crashed last time?

Does my question make sense?
or maybe you recommend us monitor Filebeat running and make sure it is automatically re-started after it crashes?

thanks,
yan

tudor · May 31, 2017, 8:40am

Filebeat stores it's state in a registry file which it writes to disk. This registry files contains the inode info, so file renames while Filebeat is down should be ok. However, having it down for that long is not really expected so there might have been some issues. Are you running the default configuration? You can inspect the registry file (if you run from packages, it's /var/lib/filebeat/registry) to get some hints.

filebeater · May 31, 2017, 7:46pm

@tudor Thanks for the explanation. We are evaluating whether we could leverage FileBeat for our service or need to write our own. I was asking about hypothetical scenarios, which could happen in production environment. It seems like we do need to set up monitor/alert to make sure FileBeat did not just die silently. Just curious Is a typical set up for this? thanks!

system · June 28, 2017, 7:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.