Hi,
I run filebeat to send Bro logs from a machine(sensor) to my logstash server. During the weekend, a power cycle caused all our infrastructure to reboot. While the sensor started up correctly, logstash server had some issue and failed to start. The filebeat running on sensor thus opened the log files and started the harvester but could not push the data to logstash. After around 24 hours, logstash was also successfully started and it was observed that filebeat did not send old data (24 hours) to logstash. Instead only the data for last 1 hour ("current" file and the file opened before "current").
A little bit of background about Bro - It writes log files that are rotated every hour. Rotated files are gzipped and moved to a separate directory.
My understanding of Filebeat is we define what file to watch. Filebeat keeps track of the file using inode and offset. If the inode changes, filebeat opens the "new" file at the beginning while continuing to parse the "old" already opened file till EOF is reached. The close_inactive
and clean_inactive
settings dictate when the opened file is closed and removed from registry respectively. If an open file gets "rotated"(i.e. the original file has been deleted), it will not be removed from disk till filebeat has it open. When filebeat closes the handle, the inode will be marked as free and filebeat can't reopen this file again.
Logs are too big to post here, but I have uploaded them on pastebin.
One of my input config looks something like this
- type: log
paths:
- /var/log/bro/logs/current/dns.log
exclude_lines: ['^#', '\.microsoft\.com\s', '\sntp\.ubuntu\.com\s', '\stouch\.kaspersky\.com\s', '\sgemdaq\s' ]
fields:
type: bro_dns
fields_under_root: true
clean_removed: False
close_removed: False
clean_inactive: 3h
ignore_older: 2h
close_inactive: 30m
#tail_files: True
I am looking to understand why filebeat did not keep all older "unread" files open and only sent the data from two files.