I'm seeing poor performance with the Kafka output plugin when I increase the linger_ms and batch_size settings. This is the opposite behavior I would expect after reading the Kafka producer documentation. Increasing batch sizes should improve throughput, not reduce it! I have not seen this issue with other Kafka producer implementations, so it seems like an issue specific to Logstash.
My pipeline is very simple - read TCP input, apply no filters, then send to Kafka output. Other notable settings are:
- persistent queue feature enabled
- 8 worker threads
- running on VM with 4 CPUs, 8GB RAM
Here are some sample results of just tweaking the linger_ms setting:
output events per second=14000
output events per second=1200
I have enabled JMX metrics on my logstash instance to look at the Kafka producer statistics. From this I can see that increasing linger_ms does result in larger sized batches being sent to Kafka, and that records queue for linger_ms amount of time before being sent. But still, the performance is greatly reduced despite improved batching.
How can I find the bottleneck for this pipeline? Are there known issues with the Kafka output plugin?