Intentionally delaying log harvesting at startup


#1

I have a use case where I need to delay the log harvesting. As I read my best option would be scan_frequency but I'm not sure if that applies for the first start and how it keeps track of this information (I suspect that in memory).
The issue is that when I recreate a container from a snapshot, I can't stop Filebeat soon enough and I get couple thousand documents added to my cluster. In a way that's really good and fast, but in this case, I have to delay the start while I can remove the logs.
Can I achieve my goal by modifying the default scan_frequency value (10s) to a higher one? Is there any other way without adding more services?


Delaying log harvesting at startup
(Pier-Hugues Pellerin) #2

@YvorL scan_frequency is only effective after the first scan so I don't it will help.

Are you using Filebeat's autodiscover for managing configuration when new container appear disapear?


#3

No, I'm not using that. I'm running an instance of Filebeat in the container itself. When a new container is created from the snapshot all processes launch including Filebeat. Unfortunately, that's before the logs are removed. You can think about it as restoring a backup from a snapshot. While the registry file contains the information about the processed logs, I believe, that since the environment changed (e.g., hostname) it'll re-read the logs.


(ruflin) #4

I wonder if the tail_files option could help you? https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#_literal_tail_files_literal Be aware that it can also mean that some events are skipped for new logs.


#5

Unfortunately, it seems it'd cause more issues than it'd solve. This would work if I can add the setting before the snapshot is taken and remove it immediately, but that's far from optimal and would slow down the process unnecessarily. :disappointed:
I need either having an option which would delay the start of scanning & harvesting on first run or one where I can set that the environment change (e.g., hostname) doesn't mean that the logs aren't already processed.


(ruflin) #6

I can't think of a good workaround here at the moment :frowning: One thing we discussed in the past is have a unique id for each log line so in case events are sent twice, they would not be duplicated but we are not there yet.


#7

I see. Thank you for the info!


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.