Delaying log harvesting at startup


#1

@pierhugues and @ruflin
Since you were who replied to my earlier post .
I was setting up Filebeat on a new server, and it seems that if I have external configurations, the reload.period will affect the first harvest time too.
When I set it to 1 minute

...
2018-11-05T16:13:21.428Z INFO cfgfile/reload.go:141 Config reloader started
2018-11-05T16:13:51.429Z INFO [monitoring] log/log.go:141 Non-zero metrics in the last 30s ...
2018-11-05T16:14:21.428Z INFO [monitoring] log/log.go:141 Non-zero metrics in the last 30s ...
2018-11-05T16:14:21.429Z INFO log/input.go:138 Configured paths: ...
2018-11-05T16:14:21.429Z INFO input/input.go:114 Starting input of type: log; ID: ...
2018-11-05T16:14:21.429Z INFO log/harvester.go:251 Harvester started for file: ...
...

So it seems that after starting Filebeat, it won't load the external files automatically, but wait for the reload period. That's fine for me, but I'm concerned that this behavior could be changed because I think people expect the configurations loaded at startup and scanned after the specified period.

Slightly connected to this. I also noticed that sometimes the new log entries for files where there aren't too many entries aren't harvested immediately but couple minutes later. Does that depend on the scan_frequency setting? I thought that it'll only affect scanning for new files.

Thank you!


(Pier-Hugues Pellerin) #2

So it seems that after starting Filebeat, it won't load the external files automatically, but wait for the reload period. That's fine for me, but I'm concerned that this behavior could be changed because I think people expect the configurations loaded at startup and scanned after the specified period.

I was not aware of that behavior, I would also have expected the scan then sleep instead of sleep than scan. If that behavior change it would be a breaking change marked in the changelog.

Concerning, scan_frequency its only affecting the discovery of new files not and not file currently read by filebeat.

Are you using any settings that could close a file, any settings that start with close*_ are configured on the harvester?


#3

I see.

Yeah, I use close_renamed because I'm rotating the logs daily. Also, I saw that there is a close_inactive default setting:

INFO log/harvester.go:276 File is inactive: XXXXXX. Closing because close_inactive of 5m0s reached.

I see what's happening by reading this. Though I'm not sure if I can set separately set the close_inactive setting in the external configuration files. This doc suggests that it's feasible:

If there are log files with very different update rates, you can use multiple configurations with different values.

Let's say I have a scan_frequency set to 10 min, reload.period to 3 minutes, and the close_inactive is set to the 5 min default. (the log file will be and stay empty)
I start filebeat at 00:00. The harvester starts at 00:03 but closes after 5 min (00:08). Does the scan frequency trigger at:

  • 00:10
  • 00:13
  • 00:18

@pierhugues Do you know if the external configurations can have separate close_inactive setting? Also, anything on the scan frequency :slight_smile: ?

Thank you!


(Pier-Hugues Pellerin) #4

@YvorL concerning scan frequency this is a good question, its a best effort it should trigger every 10min.
IIRC its done in an external go routine.

Yes external configuration can have complete separate settings.


#5

Thank you!