First thing, put the thread number down to 1. Kafka I'd already making
threads per partition so there is no parallelism to be gained. We should
remove that option from the plugin. Also set your consumer group name
otherwise it'll assign a rand one each time which is why it isn't resuming.
You know, I was wrong about the group it is in fact logstash by default so it is strange that it wouldn't resume. Do you have other logstash consumers running with the same group?
Yes, i have multiple consumers on the same group but for different topics.
Sometimes i have seen, the logstash consumers do not make a offset entry in the zookeeper.
If you see both the outputs, in the first case all the owner suffixes are same(0), in the second one it has suffixes from 1 to 10. Looks like logstash is not doing multi threading by default.
This is getting a little off topic but the tldr; is that the underlying
jruby-kafka consumer thread isn't doing anything but multiplexing from the
reader which creates a consumer thread per partition into the queue that
logstash passes in. So yes there are more Kafka streams but each one isn't
adding anything. Check out discussion here:
When the latest major version of jruby-kafka is released then the
logstash-kafka-input will need to change to pass a process into the Kafka
stream which could add parallelism however even then it probably won't help
much because logstash has a serialized processing chain, ie input
queue>filter queue> output queue. If logstash were more like spark and
maintained parallel threads for each chain then bumping the number of Kafka
streams would help.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.