Logstash output load balancing and workers settings

Hi,

best advice is measure, measure, measure. This post contains a python script + instructions how you can get throughput info right from filebeat. For testing have a log file prepared (I often use NASA HTTP logs) and delete registry files between runs. (optionsl) In addition use the null (or stdout with dotted codec and pv tool) output plugin in logstash to not generate any back-pressure from logstash.

The worker: ... config is really per host. The default value is 1. That is if H=# of hosts and W=# worker, then H*W workers doing output will be spawned. In your sample config it means you're spawning like 16 workers pushing data in total.

There are 2 options to try in your case:

  1. set filebeat.publish_async: true. This will push batches as soon as batches are ready into the publisher pipeline. In this case I'd set spool_size between [bulk_max_size, bulk_max_size * worker * (# of hosts)]. If one logstash instance is not responding (or slowing down), filebeat continues publishing events using the other workers (due to async/pipelines publishing).

  2. set filebeat.spool_size = output.logstash.bulk_max * output.logstash.worker * len(output.logstash.hosts). This will split bathces into output.logstash.worker * len(output.logstash.hosts) batches when publishing, so every host gets it's share. Drawback is, if one logstash instance slows down it first takes some timeout to detect it's not responding and transmitting the sub-batch via another logstash instance, basically blocking output until all sub-batches have been ACKed.

My gut feeling tells me option 1 would have better throughput (despite gut feelings being often wrong), but in the end it's up to you to run experiments and measure your setup to figure some good configurations matching your requirements. And don't forget, the higher your throughput, the more resources will be required by filebeat and logstash to process your data.

1 Like