Harvester not started for new files in configured paths


I have a problem I've verified in filebeat 6.8.3, 7.5.1, and 7.16.3

I have a filebeat.yml filebeat.inputs, type log, with three configured paths with ** in the middle portions, something like

  - /var/log/a/b/**/ONE
  - /var/log/a/b/**/TWO
  - /var/log/a/b/**/THREE

and on filebeat startup, in a situation where multiple files pre-exist, e.g. for the "**" portion above, I have C/D, C/E, C/F, e.g.


filebeat will establish an input_id for it and start harvesters for all present files.

However, any NEW files that appear after startup (e.g. a new intermediary directory C/G with its own files ONE TWO THREE) will rarely have a harvester started up for it. Occasionally a single harvester will start up for a single file but this is very rare.

If no files exist on startup that match the input_id's configured paths, then filebeat will properly notice the three files appearing and start harvesters for this.

The above behavior confirmed both via filebeat -d 'input' and lsof on the process.

My read of the documentation and previous discussions (e.g. Filebeats not harvesting new file - #2 by pierhugues) suggest that filebeat should always find and harvest new files that match the configured paths.

I've tried modifying scan_frequency to no effect.

I'd appreciate any ideas to debug/solve, or confirmation to file a github issue.

Thank you.

Restarting filebeat results in harvesters started to collect all files present (incl ones missed by previous process).

No change with symlinks:true (where /var/log -> /mnt/var/log on my system).

On 7.16.3, splitting into three separate inputs with one configured path each increases the probability that a harvester will start, but it will not get all new files. IIRC, splitting into separate inputs on 7.5.1 does not improve the situation.

Anybody have any ideas?
I realize this works for most people; for me in some environments it seems to work reliably but in others never.