Hey first time posting here and looking for some understandings.
I have a logstash config that is trying to read a directory containing over 100,000 files. I've ran trace logs and even with sincedb_path set to /dev/null none of the files get processed. After every file I see sincedbcollection - associate: unmatched.
Is there a limit to how much logstash can handle without sincedb?
unfortunately I cannot share any configs or logs as it is work related
If you cannot share the configs it is hard to help you: Can you anonymize the fields which contain paths and other data which might point to a company or are otherwise confidential?
Are the files written completely or are they still written to?
Thanks for the reply. The current mode is set to read, start_position is beginning, I've tried ignore_older, but it does not work. I know the path is correct as with a smaller subset of data it works with no issues. I am using an XML filter, not sure if this would be the bottleneck?
Do you have monitoring enabled for logStash? In this case you can check the throughput in Kibana under Stack Monitoring->Pipelines->your pipeline which looks like this:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.