let's say I've got a cluster with 10 machines a 12 cores and I see data not being collected fast enough, so I need to raise my threads which is already at 12.
Is better to go with the same pipeline and raise the threads to 16 or even 24 or just build another pipeline which collects the same data, so I got 2 with 12 threads each?
This is not the kind of question that can easily be addressed in a forum like this. You need to identify what the bottleneck is and then address it. How to address it will depend on what the bottleneck is. Remember the bottleneck may not be in logstash, it could be in whatever logstash is reading from or writing to.
Well, the bottleneck definitely is the Logstash as it's pulling data out of a redis db and then writing to another one. Both got enough ressources to handle the requests, so..
I've tried many different scenarios, ranging from a single pipeline with 12 - 36 workers vs. 2 - 3 pipelines with 12 workers each. As I'm using 10 GB of heap, I multiplied the inflight count by 2,5 (cause 2,5x the standard heap) and then set the batchsize, so I'm barely below that threshold.
As I said in the previous post, what changes did you specifically did in the redis input and output?
What is your redis input? Did you change the number of threads and batch_count in the redis input? Changing those settings may help a lot with the ingestion.
Changing pipeline.workers and pipeline.batch.size will impact basically the filter and output blocks, but will make little to no difference in the input, which seems to be your issue.
Please share your logstash configuration with your input and your logstash.yml and pipelines.yml.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.