When should we set pipeline.workers to 1?

mtudisco · March 4, 2020, 12:27pm

Hi,

I have read (but cant find where and what) that in some conditions pipeline.workers should be set to 1 or else the information is not processed in the right order.

My configuration in conf.d is to have several files, one for the input, one for the output, and more than one file for filters, one file for each type of filter plus a final one that applies to all filter:
1000_input.conf
2001_filter_type1.conf
2002_filter_type2.conf
.........
200n_filter_type2.conf
3000_filter_general.conf
5000_output.conf

Input comes from filebeat from different servers. And i use default configuration of just one pipeline. Because of not knowing about pipeline.workers i have set it to 1, but when i have several servers sending information its not processing at fast enough and i'd would like to increment workers.

Can anyone help?

thanks

rugenl · March 4, 2020, 1:19pm

Increasing pipeline workers will increase events processed if system resources exist. (IE it may not help on a 1 core VM)

I did some benchmarking when we first started ingesting some large Exchange logs from about 20 servers. We were going to process 72 hours of prior logs, so basically unlimited demand until filebeat caught up to the current time.

We had 6 logstash servers in the filebeat output, this pipeline had 4 workers on each, I don't remember the exact number, but say we could process 10K events/sec initially, I increased workers to 8 and this went to about 15K, then 12 workers went to about 20K. Our normal daily peak was about 4K events/sec, so I reduced the number of workers back to 4 and it's been fine since.

For pipelines that can handle their workload with 1 or 2 workers, that is a good setting. In your case, I'd add workers.

You can configure filebeat to round robin output to multiple logstash servers for even more capacity.

mtudisco · March 4, 2020, 1:43pm

Thanks for the reply, in my case i have just one logstash with 4 cpu so i could put more than one pipeline workers. In case i need i can even add some cpu's to the system.

The main question is that if having multiple pipeline workers might impact how information is procesed and might got the wrong order of processing. I have read that in some cases you should set workers to 1 but cant find where i read it.

thanks

rugenl · March 4, 2020, 2:57pm

I think it may been in a discussion of multi-line events, but those are better handled at the filebeat layer.

You can probably run more than 1 pipeine worker per cpu, just watch the stats.

mtudisco · March 4, 2020, 3:06pm

Thanks, i have multiline but at filebeat side, so i will try to increase workers
thanks

Badger · March 4, 2020, 5:37pm

If you are using an aggregate filter then you have to set pipeline.workers to 1. You should also be aware of this bug. Generally the order of events is not preserved.

mtudisco · March 4, 2020, 6:33pm

thanks, not using aggregate filter. The only one i might think if could cause any trouble is split and dissect. Reviewed documentation of both filters but none mention anything about workers.

Badger · March 4, 2020, 6:45pm

Neither split nor dissect cares about the number of worker threads.

system · April 1, 2020, 6:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pipeline.workers configuration and aggregation filter Logstash	9	1070	October 15, 2021
Defaulting pipeline worker threads to 1 Logstash	2	2217	January 5, 2017
Logstash - configuring pipeline.workers for input stage Logstash	2	486	January 13, 2021
Logstash multi pipeline optimal configuration Logstash	1	232	February 9, 2021
Logstash pipeline.workers Logstash	2	1291	November 4, 2022

When should we set pipeline.workers to 1?

Related topics