Intentionally delaying log harvesting at startup

YvorL · September 25, 2018, 2:43pm

I have a use case where I need to delay the log harvesting. As I read my best option would be scan_frequency but I'm not sure if that applies for the first start and how it keeps track of this information (I suspect that in memory).
The issue is that when I recreate a container from a snapshot, I can't stop Filebeat soon enough and I get couple thousand documents added to my cluster. In a way that's really good and fast, but in this case, I have to delay the start while I can remove the logs.
Can I achieve my goal by modifying the default scan_frequency value (10s) to a higher one? Is there any other way without adding more services?

pierhugues · September 26, 2018, 1:17pm

@YvorL scan_frequency is only effective after the first scan so I don't it will help.

Are you using Filebeat's autodiscover for managing configuration when new container appear disapear?

YvorL · September 26, 2018, 4:04pm

No, I'm not using that. I'm running an instance of Filebeat in the container itself. When a new container is created from the snapshot all processes launch including Filebeat. Unfortunately, that's before the logs are removed. You can think about it as restoring a backup from a snapshot. While the registry file contains the information about the processed logs, I believe, that since the environment changed (e.g., hostname) it'll re-read the logs.

ruflin · September 26, 2018, 9:28pm

I wonder if the tail_files option could help you? https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#_literal_tail_files_literal Be aware that it can also mean that some events are skipped for new logs.

YvorL · September 27, 2018, 10:08am

Unfortunately, it seems it'd cause more issues than it'd solve. This would work if I can add the setting before the snapshot is taken and remove it immediately, but that's far from optimal and would slow down the process unnecessarily.
I need either having an option which would delay the start of scanning & harvesting on first run or one where I can set that the environment change (e.g., hostname) doesn't mean that the logs aren't already processed.

ruflin · October 1, 2018, 6:57am

I can't think of a good workaround here at the moment One thing we discussed in the past is have a unique id for each log line so in case events are sent twice, they would not be duplicated but we are not there yet.

YvorL · October 2, 2018, 8:39am

I see. Thank you for the info!

system · October 30, 2018, 8:40am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Delaying log harvesting at startup Beats filebeat	5	901	December 18, 2018
Filebeat harvest only currently running docker containers Beats	7	1069	November 29, 2017
Filebeats is re-processing logs once it restarts Beats filebeat	6	4629	April 18, 2018
File beat stop harvesting logs from few files suddenly , works fine after restart Beats filebeat	2	425	May 23, 2019
Filebeat container config Beats filebeat	7	1542	March 16, 2017

Intentionally delaying log harvesting at startup

Related topics