Increasing Logstash Throughput When the Codec is the Bottleneck

Thanks for the recommendations again. I received about a 10-12% gain in throughput by updating the nice level for Logstash. The sysctl optimizations provided about another 5% of gain.

I've horizontally scaled my Logstash instances in my container using Nginx as a load balancer as seen here:

I'm curious on your recommendations for tuning this setup.
I know that much performance can be subjective to my setup. That being said, given the hardware and the fact I'm trying to primarily process sFlow, I'm curious if you have any recommendations on the knobs I have to tune this.

  • How many instances of Logstash would you deploy per physical machine?
    • I can deploy more RAM to this container running these instances if that'll help
  • How many workers per instance of Logstash?
  • Batch size per instance?
  • Other parameters I'm overlooking?

One thing to note is that I take about a 5ms penalty with the Kafka output for every document.

I also used Nginx as a load balancer after finding that Haproxy doesn't have UDP support. I may swap that over to Traefik built into the image at some point., but their UDP support doesn't seem fully-matured either.

I've ruled out Samplicator as a bottleneck, as another service it fans out data to is verifying it's receiving all the packets, also just writing from Samplicator to a file, I can verify I don't lose data as we do when Logstash is connected.