Question: what happens when Logstash is unavailable for long?

I have a pipeline FileBeat --> Logstash --> ElasticSearch. I want to know what FileBeat does in case it cannot access Logstash for a longer time period.

The application being monitored logs to a file, the current file is always named like example.log, it gets rotated every hour, the older versions are renamed to example.log.2020-02-11-11, example.log.2020-02-11-10 and so on. After 4 hours they also get compressed.

Now assuming that Logstash is not available for a long time period: my question/worry is that Filebeat would keep the file harvesters open, as long as it has not received an acknowledgment from Logstash. And that this could lead in memory/disk usage on the monitored machine.

I have found the option ignore_older in FileBeat's log input configuration. My question is whether it is enough to set this option? Assuming close_inactive has the default value of 5m.

Typically in this case Filebeat will continue read the incoming events until its queue is full (see queue settings here, the default size is 2048 events), then will pause harvesting until its backend is available, at which point it will resume. Logs that are deleted or rotated outside of Filebeat's target pattern in the meantime will be skipped.

Memory is probably only a concern if you're expecting very large events (which is possible but uncommon in plaintext logs), in which case you could lower the queue size. On the other hand, if the log entries are comparatively small you could increase the queue size to reduce the amount of data lost when logstash is unavailable.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.