File Input from directory with 100K files

Hey first time posting here and looking for some understandings.

I have a logstash config that is trying to read a directory containing over 100,000 files. I've ran trace logs and even with sincedb_path set to /dev/null none of the files get processed. After every file I see sincedbcollection - associate: unmatched.

Is there a limit to how much logstash can handle without sincedb?

unfortunately I cannot share any configs or logs as it is work related

Hi,

If you cannot share the configs it is hard to help you: Can you anonymize the fields which contain paths and other data which might point to a company or are otherwise confidential?
Are the files written completely or are they still written to?

You might want to check the following settings:

  • mode
  • start_position should be set to beginning if mode is either unset or explicitly set to tail to read all data instead of only ingesting new data.
  • ignore_older - maybe this setting forces LogStash to ignore your files?
  • path - have you checked that the path is correct and LogStash is able to read the files?

Best regards
Wolfram

Thanks for the reply. The current mode is set to read, start_position is beginning, I've tried ignore_older, but it does not work. I know the path is correct as with a smaller subset of data it works with no issues. I am using an XML filter, not sure if this would be the bottleneck?

Do you have monitoring enabled for logStash? In this case you can check the throughput in Kibana under Stack Monitoring->Pipelines->your pipeline which looks like this:

If a filter limits the performance you would see that the input plugin would have more events emitted per second than the XML filter.

Unfortunately I don't have stack monitoring on the pipelines.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.