Filebeat: initial ingestion of 2M files

Baygon · August 24, 2021, 2:34pm

I have about 2M log files to process for initial ingestion.
These csv files are spread on 4 folders.
I have set up a filebeat pipeline and set harvester_limit: 0

I have also tried to set up 4 filebeat instances, modify the ulimit to 500000 (I dont seem to be able to do more), but still same issue.

After harvesting for a bit (maybe 20s) filebeat throws error that too many files are opened.
How would you go around that?

EDIT: though this problem is for a initial load, it might anyway come later. The logs generation is around 30k files per hour.
I am setting:

  clean_removed: true
  scan_frequency: 300s
  ignore_older: 350s
  clean_inactive: 800s

but still within these 800s I might accumulate a significant amount of logs files

legoguy1000 · August 25, 2021, 12:43am

Set the close_eof: true. Log input | Filebeat Reference [7.14] | Elastic that way the harvesters will stop as it moves to the next files instead of just staying open indefinitely.

Baygon · August 25, 2021, 1:03am

This totally makes sense. Most of the files have very little content inside, that is working just fine now!

system · September 22, 2021, 3:04am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat memory leak suddenly after all the files consumed Beats filebeat	5	1455	June 13, 2018
Harvester_limit over directory with more 1M files Beats filebeat	3	6316	May 16, 2018
Starting Filebeat with large number of logs Beats	11	2256	July 7, 2016
Filebeat high cpu load on Win 2012 R2 Beats filebeat	4	404	June 7, 2018
Putting mass archived log files to Elasticsearch using Filebeat Beats filebeat	3	562	May 18, 2021

Filebeat: initial ingestion of 2M files

Related topics