I have read (but cant find where and what) that in some conditions pipeline.workers should be set to 1 or else the information is not processed in the right order.
My configuration in conf.d is to have several files, one for the input, one for the output, and more than one file for filters, one file for each type of filter plus a final one that applies to all filter:
1000_input.conf
2001_filter_type1.conf
2002_filter_type2.conf
.........
200n_filter_type2.conf
3000_filter_general.conf
5000_output.conf
Input comes from filebeat from different servers. And i use default configuration of just one pipeline. Because of not knowing about pipeline.workers i have set it to 1, but when i have several servers sending information its not processing at fast enough and i'd would like to increment workers.
Increasing pipeline workers will increase events processed if system resources exist. (IE it may not help on a 1 core VM)
I did some benchmarking when we first started ingesting some large Exchange logs from about 20 servers. We were going to process 72 hours of prior logs, so basically unlimited demand until filebeat caught up to the current time.
We had 6 logstash servers in the filebeat output, this pipeline had 4 workers on each, I don't remember the exact number, but say we could process 10K events/sec initially, I increased workers to 8 and this went to about 15K, then 12 workers went to about 20K. Our normal daily peak was about 4K events/sec, so I reduced the number of workers back to 4 and it's been fine since.
For pipelines that can handle their workload with 1 or 2 workers, that is a good setting. In your case, I'd add workers.
You can configure filebeat to round robin output to multiple logstash servers for even more capacity.
Thanks for the reply, in my case i have just one logstash with 4 cpu so i could put more than one pipeline workers. In case i need i can even add some cpu's to the system.
The main question is that if having multiple pipeline workers might impact how information is procesed and might got the wrong order of processing. I have read that in some cases you should set workers to 1 but cant find where i read it.
If you are using an aggregate filter then you have to set pipeline.workers to 1. You should also be aware of this bug. Generally the order of events is not preserved.
thanks, not using aggregate filter. The only one i might think if could cause any trouble is split and dissect. Reviewed documentation of both filters but none mention anything about workers.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.