We have 2 Logstash instances having 32gb of heap size per node and 6 CPU per node.
We have 5 pipeline IDs in both these nodes . For one pipeline id we have set 6 worker , second pipeline id 3 and for remaining 3(third and fourth pipeline) . Is it correct?
And for pipeline id 1 which we have 6 workers we have set 20 consumer threads as we are using kafka input plugin . We observed that in stack monitoring the batch size is 125 although we have set 1000 for one of the pipeline id. We also observed that CPU utilisation is 60% to 50%
So if we have 6 cpu how much number of workers we can set if have multiple pipelines and how much thread we can assign for a worker.
32 GB of heap for logstash seems exagerated, what is the total event rate you have in those nodes?
This is expected, the value that shows in stack monitoring is the default value for the logstash process, it doesn't matter if you change for one or more pipelines in pipelines.yml, if you do not change it for the entire logstash process in logstash.yml, it will show the default, which is 125.
There is no right on wrong here, it depends on your use case and the event rate of each pipeline you have. If you have a pipeline that you know it will have a low event rate you may set a lower number of workers for example.
I recommend to not set pipeline.workers and only change if you are having performance issues or know that a certain pipeline works fine with 1 or 2 workers for example.
The defautl will always be the number of cpu cores.
Consumer Threads is a Kafka input config, it is related to the number of partitions of your topic, per the documentation:
Ideally you should have as many threads as the number of partitions for a perfect balance — more threads than partitions means that some threads will be idle
So, if your topic have less then 20 partitions, some threads will be idle.
There is no answer for this and you need to adjust as needed according to the tuning documentation linked.
Are you having any issues with ingestion? It is not clear.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.