When does Filebeat lose data?

Filebeat 5.6, feeding to Logstash. I'm beginning to suspect that there are scenarios in which lines written to log files do not end up being passed to Logstash, so I'm wanting to check what's expected please.

(1) Start Filebeat when there are already lines in the configured log files. Should they be read, or should Filebeat read newly added lines only?

(2) Filebeat fails to contact Logstash (my more recent case was a DNS failure such that Filebeat couldn't resolve the Logstash host name). When this situation is resolved (eg we fixed the DNS server) should Filebeat ship the records it failed to ship in the past, because of these failures, or should it resume shipping with newly written records only, with the failed ones being lost forever?

From How Filebeat Works:

When Filebeat is restarted, data from the registry file is used to rebuild the state, and Filebeat continues each harvester at the last known position.

So Filebeat only reads newly added lines.

Also from How Filebeat Works,

If the output, such as Elasticsearch or Logstash, is not reachable, Filebeat keeps track of the last lines sent and will continue reading the files as soon as the output becomes available again.

Yes, I know what the documentation says, but I don't think it's what I'm seeing happening. So perhaps I should have been clearer: I'm asking whether it really does what the documentation says (version 5.6).

By (1) I didn't mean lines newly added after ones that Filebeat had already read, I meant lines newly added after Filebeat starts up. What I think I am seeing is

(a) newly installed application starts up and writes lines to new log files
(b) newly installed Filebeat starts up and doesn't ship those lines
(c) application writes additional lines to the log files
(d) Filebeat does ship these new lines.

That behavior should only happen when tail_files is enabled for a prospector. Otherwise it will begin reading new files from the beginning.

Files are tracked based on their inode. So a file is new when its inode value does not exist in the registry file. If the file's inode already existed in the registry for some reason then Filebeat would resume from offset contained in the registry (see FAQ: Inode reuse causes Filebeat to skip lines?).

I suggest that you enable debug logging for Filebeat to get a better understanding of is happening. Filebeat should log information about when it detects a new file and starts reading it. In your config use:

logging.level: debug
logging.selectors: [registrar, prospector, harvester]

That was my understanding from the documentation too. I've not specified tail_files, and thought I'd read that that was OK because it defaults off.

I'll try the debug logging if/when I get back to this, thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.