Trying to understand why filebeat dropped events during logstash failure

(Dheeraj Gupta) #1


I run filebeat to send Bro logs from a machine(sensor) to my logstash server. During the weekend, a power cycle caused all our infrastructure to reboot. While the sensor started up correctly, logstash server had some issue and failed to start. The filebeat running on sensor thus opened the log files and started the harvester but could not push the data to logstash. After around 24 hours, logstash was also successfully started and it was observed that filebeat did not send old data (24 hours) to logstash. Instead only the data for last 1 hour ("current" file and the file opened before "current").

A little bit of background about Bro - It writes log files that are rotated every hour. Rotated files are gzipped and moved to a separate directory.

My understanding of Filebeat is we define what file to watch. Filebeat keeps track of the file using inode and offset. If the inode changes, filebeat opens the "new" file at the beginning while continuing to parse the "old" already opened file till EOF is reached. The close_inactive and clean_inactive settings dictate when the opened file is closed and removed from registry respectively. If an open file gets "rotated"(i.e. the original file has been deleted), it will not be removed from disk till filebeat has it open. When filebeat closes the handle, the inode will be marked as free and filebeat can't reopen this file again.

Logs are too big to post here, but I have uploaded them on pastebin.

One of my input config looks something like this

- type: log
    - /var/log/bro/logs/current/dns.log
  exclude_lines: ['^#', '\.microsoft\.com\s', '\sntp\.ubuntu\.com\s', '\stouch\.kaspersky\.com\s', '\sgemdaq\s' ]
    type: bro_dns
  fields_under_root: true
  clean_removed: False
  close_removed: False
  clean_inactive: 3h
  ignore_older: 2h
  close_inactive: 30m
  #tail_files: True

I am looking to understand why filebeat did not keep all older "unread" files open and only sent the data from two files.

(Pier-Hugues Pellerin) #2

@Dheeraj_Gupta Looking at your configuration it should have kept the FD open, did you restart Filebeat when Logstash was not reachable?

(Dheeraj Gupta) #3

@pierhugues No I did not restart it. From the logs as well, there is no stop/start event. I also did not find any "Opening file" or "Closing File" entries during the entire 30 hour period logstash was unreachable.

Also the logs I posted are slightly messed up due to copy paste. The first two lines have the wrong timestamp. The startup also occurred at approx. 2018-09-09T01:38:00.

(Pier-Hugues Pellerin) #4

I am still looking throught the code, I wonder if close_inactive will be triggered in that case, any 'File is inactive' message in the log?

(Dheeraj Gupta) #5

@pierhugues I could not find any such messages in the log for the entire time-period when logstash is unreachable. Only the connection refused and retryer messages are there. The first "close inactive" message comes around 30 mins AFTER the logstash was restarted.

I have uploaded the entire gzipped log here. Unfortunately I did not have debug logging enabled so only INFO and above messages are there. The logfile begins with the server restart at around 09-Sep-2018 01:37 and logstash is restored around 10-Sep-2018 10:15

(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.