Filebeat: initial ingestion of 2M files

I have about 2M log files to process for initial ingestion.
These csv files are spread on 4 folders.
I have set up a filebeat pipeline and set harvester_limit: 0

I have also tried to set up 4 filebeat instances, modify the ulimit to 500000 (I dont seem to be able to do more), but still same issue.

After harvesting for a bit (maybe 20s) filebeat throws error that too many files are opened.
How would you go around that?

EDIT: though this problem is for a initial load, it might anyway come later. The logs generation is around 30k files per hour.
I am setting:

  clean_removed: true
  scan_frequency: 300s
  ignore_older: 350s
  clean_inactive: 800s

but still within these 800s I might accumulate a significant amount of logs files

Set the close_eof: true. Log input | Filebeat Reference [7.14] | Elastic that way the harvesters will stop as it moves to the next files instead of just staying open indefinitely.

1 Like

This totally makes sense. Most of the files have very little content inside, that is working just fine now!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.