We are encountering a strange behaviour with filebeat.
Let me explain our use case as it is not the typical goal of the filebeat log prospector :
We use filebeat with 5 log prospectors.
Each prospector observes one or more directories using * wildcard in directory path and filename : C:\root\sub_\.csv for instance
Each directory contains from 100 to 43.000 files with a total of 125.000 files
2 prospectors are harvesting CSV files containing only one row
The 3 other prospectors are reading XML or custom format files using multiline. For these files, one file = one event
The CSV files are by far the most numerous
Each prospector has its
ignore_olderproperty set to
We are on a Windows Server 2012 R2 server
The problem is when we have to change filebeat config and restart the service.
Last time, I started filebeat at 9 AM and in the log file I can see that it finished loading prospectors and starting harvesting at 11 PM !
During all the "loading" time, I only see lines like these one :
2017-05-17T18:24:50+02:00 INFO No non-zero metrics in the last 30s 2017-05-17T18:25:20+02:00 INFO Non-zero metrics in the last 30s: publish.events=3 registrar.states.update=3 registrar.writes=3 2017-05-17T18:25:50+02:00 INFO Non-zero metrics in the last 30s: registrar.states.update=4 registrar.writes=4 publish.events=4
The problem seems to be related to filebeat registry data loading. If I delete the registry data and let filebeat consume all data (only the last 24h due to configured
ignore_older) the harvesting starts very quickly.
Can you give me some explanations about this behaviour and if there is a way to reduce the loading time ?