I have about 2M log files to process for initial ingestion.
These csv files are spread on 4 folders.
I have set up a filebeat pipeline and set harvester_limit: 0
I have also tried to set up 4 filebeat instances, modify the ulimit to 500000 (I dont seem to be able to do more), but still same issue.
After harvesting for a bit (maybe 20s) filebeat throws error that too many files are opened.
How would you go around that?
EDIT: though this problem is for a initial load, it might anyway come later. The logs generation is around 30k files per hour.
I am setting:
clean_removed: true
scan_frequency: 300s
ignore_older: 350s
clean_inactive: 800s
but still within these 800s I might accumulate a significant amount of logs files