The Elastic stack that I support was recently upgraded from using logstash 2.4 to 6.4. I've been troubleshooting a decrease in efficiency and found an interesting difference in how the versions are handling reading off of Kafka partitions. When running Logstash 2.4, consumers would equally split the partitions available for a particular Kafka topic and only process logs off of those assigned to it. Since upgrading these same consumers to running Logstash 6.4, this is no longer happening. I can now see consumers processing logs off of almost all partitions, instead of splitting them up among themselves. This may not fully explain the performance loss that we're witnessing, but I think it could be contributing.
I haven't been able to find any new settings for the Kafka input plugin that would explain this change in behavior. I am using the same plugin configuration for both consumer versions:
So I guess my question is, is it possible for the newer Kafka input plugin bundled with Logstash 6.x to process kafka partitions more similarly to the older versions?
I found a setting that looked promising, partition.assignment.strategy, as you can set it to org.apache.kafka.clients.consumer.RoundRobinAssignor, it looks to default to org.apache.kafka.clients.consumer.RangeAssignor in newer versions. This setting looks like it will do what I want, but I am getting a peculiar error whenever adding it to Logstash, on startup I receive this message:
Exception in thread "Ruby-0-Thread-69: :1" org.apache.kafka.common.errors.InconsistentGroupProtocolException: The group member's supported protocols are incompatible with those of existing members or first group member tried to join with empty protocol type or empty protocol list.
I thought perhaps this was due to me reusing the same consumer group, however I still get this same error whenever I try to start Logstash with a brand new group as well.
Any ideas? I'm really banging my head on my desk with this one.
It seems that the simplest solutions are often overlooked. The fact that I was performing a rolling upgrade of these Logstash containers was what was resulting in the error above. I completely deleted the stack and rebuilt it using the updated version, the one that had the RoundRobinAssignor flag. The stack started up and the Logstash consumers are now splitting up Kafka partitions in a more efficient manor. Whew!
Hopefully this helps someone else who runs into a similar issue. Thank you all.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.