Harvester_limit over directory with more 1M files

Juan_Andres_Ramirez · April 17, 2018, 6:01pm

hello guys,
I have a directory with around 1M files, so I tried added in the filebeat.yml a harvester_limit:

filebeat.prospectors:
- type: log
  enabled: true
  paths:
    - /logs/**
  harvester_limit: 400000

But filebeat trying harvester all files in the directory, so I have errors when harvester over 400000:

ERROR log/prospector.go:437 Harvester could not be started on new file:file-1.log, Err: Harvester limit reached

So, I have some questions:

1- This is a really error?
2- What happen with these files with error harvester limit?, filebeat will try harvester these files again?

Thank you.

kvch · April 18, 2018, 12:53pm

From the harvesters point of view it's an error, because it cannot be started due to the configured limit. From users' point of view I would say it's rather a warning. You can choose to act on it and increase the harvester_limit if required. But if it's on purpose it might be a bit annoying to see Filebeat send these error messages.

Files which cannot have a harvester due to the limit is not read. By specifying a limit you tell Filebeat to read at most 400,000 files in parallel. If one of those files is read completely and closed a new harvester can be started for a new file. Filebeat scans the directory for unread files periodically. (So the answer is to your second question is yes.) The frequency can be set using scan_frequency. See more on this option: https://www.elastic.co/guide/en/beats/filebeat/current/configuration-filebeat-options.html#scan-frequency
In theory I can imagine a situation when you have 400,000 log files which is updated all the time and Filebeat cannot keep up with and those files are never closed, so the other 600,000 log files can never be read. But in real life I think the log flow is not that fast.
Also, to avoid keeping log files open for too long, you can set close_inactive. See more on this option here: https://www.elastic.co/guide/en/beats/filebeat/current/configuration-filebeat-options.html#close-inactive

Juan_Andres_Ramirez · April 18, 2018, 1:10pm

Understood, thanks.

system · May 16, 2018, 1:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
High CPU and Harvester Errors with 20000 log files Beats filebeat	3	815	March 5, 2021
Harvester could not be started on new file Beats	1	1014	March 2, 2020
Filebeat stops forwarding logs to Logstash after setting max harvesters Beats filebeat	2	1075	December 21, 2017
Filebeat: initial ingestion of 2M files Beats filebeat	3	276	September 22, 2021
Filebeat configuration's specific value Beats filebeat	3	1026	August 27, 2018

Harvester_limit over directory with more 1M files

Related topics