Kafka Input Performance Problems

I've been working to convert an ELK setup over to using Logstash as a parsing engine, from something home-grown (don't ask). I'm running into a problem with the performance of the Kafka input.

Versions: Everything 7.6.x

Logs go from rsyslog to a 6-node Kafka setup. I've played with different partition sizes but right now there are 24 partitions for the topic I care about. Replication factor 2.

On the 6 nodes there is also a copy of Logstash. These nodes are 15 core, configured with 30 workers. Was originally default but upped it in an attempt to increase performance.

The Kafka input has consumer_threads set to 4.

My basic problem is I cannot pull from Kafka fast enough. If I use kafka-consumer-groups.sh to watch partition lag, it just goes up and up over time. I'm pushing between 30k-40k messages into Kafka in prod.

Early on in this project my CPU's were pegged and I traced that to some bad groks. Now, my CPU's run maybe 40% average with most of my logstash threads idle.

The problem is not:

  • my Filters....I've checked them extensively and CPU usage is fine.
  • my Elasticsearch back end I'm sending to. When I put the Logstash setup in play I can watch my ES setup ingest ~20k per second....but when not using logstash I have seen this system ingest well over 100k per second in tests.

I've read so much conflicting information on having Logstash pull from Kafka. I need more, and am not sure how to accomplish it. Some things say more partitions, then some things say that's unlikely to help. I have idle CPU and want to put it to work :slight_smile:

What is the current recommendation for maximizing Kafka-pull-performance?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.