In the case of multi-pipeline, I understand that the canonical way to handle performance is to manually configure each pipeline, e.g. to use N pipeline workers, or to have a batch size of M.
is it possible to let Logstash handle this? For example, configure the size of a "pool" of workers and let Logstash (or the JVM really) assign them to each pipeline as needed?
If not, is there any recommendations about how to handle this, considering it isn't trivial to predict exactly which pipelines need the most resources, when, etc?
if you do not set the value of the pipeline.workers logstash will use one worker per cpu core.
What I do in such cases is to configure the number of workers according to the number of events per second of each pipeline.
PIpelines with a high e/s rate will use one worker per cpu core, pipelines with small e/s rate will have less workers.
For example, there is no reason to let a pipeline with 50 e/s running with 8 workers on a 8 CPU machine, but it makes sense for a pipeline with 5000 e/s. It all depends on
But this is just what I do in my use case, you would need to test with your pipelines, let it run without changing the config for a couple of days and start changing it to see what works best.
Thing is, this might not be constant. One pipeline might have 2000 e/s for 5 minutes every 2 hours and 17 e/s the other 1h55, for an average of 100e/s; while another one might have a constant 200e/s (just making up random numbers here). This might also not be so predictable, there might be bursts of incoming events for whatever reason. Hence, a static, manual configuration of the number of workers seems pretty difficult to optimize for multi-pipelines situations.
There is not much you can do related to the pipeline.workers, you have the option to set it to a fixed number or do not set and use one worker per CPU.
Are you having any performance issue in your logstash node? If not, I see no reason to change the number of workers.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.