Logstash output load balancing and workers settings

steffens · April 25, 2016, 11:11am

Hi,

best advice is measure, measure, measure. This post contains a python script + instructions how you can get throughput info right from filebeat. For testing have a log file prepared (I often use NASA HTTP logs) and delete registry files between runs. (optionsl) In addition use the null (or stdout with dotted codec and pv tool) output plugin in logstash to not generate any back-pressure from logstash.

The worker: ... config is really per host. The default value is 1. That is if H=# of hosts and W=# worker, then H*W workers doing output will be spawned. In your sample config it means you're spawning like 16 workers pushing data in total.

There are 2 options to try in your case:

set filebeat.publish_async: true. This will push batches as soon as batches are ready into the publisher pipeline. In this case I'd set spool_size between [bulk_max_size, bulk_max_size * worker * (# of hosts)]. If one logstash instance is not responding (or slowing down), filebeat continues publishing events using the other workers (due to async/pipelines publishing).
set filebeat.spool_size = output.logstash.bulk_max * output.logstash.worker * len(output.logstash.hosts). This will split bathces into output.logstash.worker * len(output.logstash.hosts) batches when publishing, so every host gets it's share. Drawback is, if one logstash instance slows down it first takes some timeout to detect it's not responding and transmitting the sub-batch via another logstash instance, basically blocking output until all sub-batches have been ACKed.

My gut feeling tells me option 1 would have better throughput (despite gut feelings being often wrong), but in the end it's up to you to run experiments and measure your setup to figure some good configurations matching your requirements. And don't forget, the higher your throughput, the more resources will be required by filebeat and logstash to process your data.

Topic		Replies	Views
Can I add loadbalance to filebeat config for hosts in logstash output? Beats filebeat	6	2195	July 11, 2016
Filebeat logstash output loadbalance not working properly (not random) Beats filebeat	4	1251	June 22, 2016
How many workers on filebeat? Beats filebeat	1	463	September 21, 2021
How to output Filebeat logs to multiple hosts without load balancing? Beats filebeat	7	8270	July 5, 2017
Advanced Configuring- Filebeat Loadbalance Beats filebeat	3	267	October 2, 2020

Logstash output load balancing and workers settings

Related topics