Shipping log from an unmanagable folder structure using filebeat

luka.klaric · November 11, 2021, 6:46pm

I am trying to ship log files to logstash using filebeat. The folder structure of the input path is currently very branchy and very big.
We have a following structure:
<LOG_PATH>/<STREAM>/<WORKFLOW>/<TASK>/<EXECUTION_DATE>/<TRY_NUMBER>.log

The logs are actually not that big, but the application produces every day an average of 300.000 new log files, depending which workflows are running and which tasks. Every newly created log creates a new execution_date folder containing the log file, and at the end we are having an unmanageable structure. A 'find' through the folder can take a week or more, and a 'ls' in some workflow folder can take for hours. I even wrote a python script to count the files and task folders using glob, but the execution was terminated after some time.

As I could find out, filebeat is using filepath/glob, so my question is: how does filebeat "walk" through the input? can it be comparable to pythons glob, so that it just terminates after some time without any error log? can I assume that filebeat just can't handle this amount of files, under a shared volume?

I created once a related question link

system · December 9, 2021, 8:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shipping logs with filebeat and get logs with filebeat Beats filebeat	1	436	January 22, 2022
Shipping gitlab job logs Beats filebeat	5	3038	January 28, 2020
Filebeat doesn't harvest folder structure Beats filebeat	3	508	November 9, 2021
Filebeat isn't keeping up with our logs Beats filebeat	1	450	December 2, 2021
How Beats exactly handles files Beats	3	575	June 30, 2017

Shipping log from an unmanagable folder structure using filebeat

Related Topics