@Christian_Dahlqvist - I know I'm resurrecting an old thread here but I'd love to confirm a point you'd raised. I hope I can get input from anyone with knowledge on this issue.
For context, I'm currently supporting a deployment of 6.8.x infrastructure. I have a setup with 8 cores and 24gb of memory which has been struggling to keep up with the incoming volume of data. CPU and memory usage is acceptable, nothing overloaded on that front. The single instance has a pipeline which will take incoming beats events and dump them to files ( for backups ), and process those files from the hosts' file system using various input-file -> elasticsearch pipelines after that.
If each pipeline using the logstash-input-file plugin is configured to have multiple worker threads, will logstash spawn a separate thread each with its own instance of the plugin, each processing single-threaded?
Each input stage in the Logstash pipeline runs in its own thread. Inputs write events to a central queue that is either in memory (default) or on disk.
If we have 22 pipelines, 21 of which are operating on the input-file plugin, will the overarching logstash process have 21 single-threaded instances of the input-file plugin running?
If you have 21 file inputs configured there will be 21 single threaded instances running. You can check by getting a thread dump. In a logstash instance with five workers and three file inputs the dump will include
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.