I am trying to ship log files to logstash using filebeat. The folder structure of the input path is currently very branchy and very big.
We have a following structure:
<LOG_PATH>/<STREAM>/<WORKFLOW>/<TASK>/<EXECUTION_DATE>/<TRY_NUMBER>.log
The logs are actually not that big, but the application produces every day an average of 300.000 new log files, depending which workflows are running and which tasks. Every newly created log creates a new execution_date folder containing the log file, and at the end we are having an unmanageable structure. A 'find' through the folder can take a week or more, and a 'ls' in some workflow folder can take for hours. I even wrote a python script to count the files and task folders using glob, but the execution was terminated after some time.
As I could find out, filebeat is using filepath/glob, so my question is: how does filebeat "walk" through the input? can it be comparable to pythons glob, so that it just terminates after some time without any error log? can I assume that filebeat just can't handle this amount of files, under a shared volume?
I created once a related question link