We are encountering a strange behaviour with filebeat.
Let me explain our use case as it is not the typical goal of the filebeat log prospector :
We use filebeat with 5 log prospectors.
Each prospector observes one or more directories using * wildcard in directory path and filename : C:\root\sub_\.csv for instance
Each directory contains from 100 to 43.000 files with a total of 125.000 files
2 prospectors are harvesting CSV files containing only one row
The 3 other prospectors are reading XML or custom format files using multiline. For these files, one file = one event
The CSV files are by far the most numerous
Each prospector has its ignore_older property set to 24h
We are on a Windows Server 2012 R2 server
The problem is when we have to change filebeat config and restart the service.
Last time, I started filebeat at 9 AM and in the log file I can see that it finished loading prospectors and starting harvesting at 11 PM !
During all the "loading" time, I only see lines like these one :
2017-05-17T18:24:50+02:00 INFO No non-zero metrics in the last 30s
2017-05-17T18:25:20+02:00 INFO Non-zero metrics in the last 30s: publish.events=3 registrar.states.update=3 registrar.writes=3
2017-05-17T18:25:50+02:00 INFO Non-zero metrics in the last 30s: registrar.states.update=4 registrar.writes=4 publish.events=4
The problem seems to be related to filebeat registry data loading. If I delete the registry data and let filebeat consume all data (only the last 24h due to configured ignore_older) the harvesting starts very quickly.
Can you give me some explanations about this behaviour and if there is a way to reduce the loading time ?
What is the size of your registry file? Do you have spinning disks or reasonable fast SSD's? How many entries do you have in the registry file? I assume this could become an issue at some point.
@ruflin This topic (86625) is related to another topic I have created (86782). Could you please delete both topics because I want to create a new one with more details and elements based on my recent tests.
Thank you
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.