I noticed that in the logstash.yml file there is a field called pipeline.workers and you can specify the amount of workers that will work in parallel. The number defaults to the host's CPU cores.
My question is whether this can control CPU usage? Currently I am running the stack on my machine and when Logstash is parsing it hogs all of the CPU it can get its hands on. If I set the # of pipeline workers to half of the machine's cores will it then use only 50% of the CPU?
Is there a better way to control the CPU used by Logstash?
There's nothing built in but I suppose you could use the cpulimit program or run Logstash in a cgroup (e.g. via a Docker container) that you lock down. That said, it's not clear why you can't just lower the priority of the Logstash process. If the machine has spare cycles, why not allow Logstash to use them?
Sometimes I just want Logstash to run and be unhindered in the area of CPU. But in other cases I want Logstash to parse in the background while I work on other things in the foreground. The problem right now is that when I run Logstash and try to run any other software my computer creeps to a halt until Logstash is done parsing.
If you're worried about CPU /Core usage then you have to configure the hosting JVM accordingly.
The following are good places to start in anyone's quest to limit the Logstash usage of CPU resources:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.