Kafka Consumer cluster that can read dynamic topics and bulk index the data


(Ramky) #1

I need to build scalable Kafka consumer cluster, which reads data from dynamically added Kafka topics and bulk index the data to Elastic-search. At present,i developed a java consumer client jar which reads data from given list of Kafka topics and feed them to ES cluster using bulk indexing. I also tried using Spark, but performance is 60% lower than running jar from command line.

I will have new topics added to Kafka cluster on fly. So i need to develop a scalable cluster for consuming various topics of Kafka data and feed the data to ES.

Please suggest me the better way to build a scalable Kafka Consumer and bulk index the data.


(Mark Walkom) #2

Instead of building that jar you could have just used Logstash - https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html

Not sure what you can do around the dynamic aspect.


(Tin Le) #3

Kafka 0.8.2 or newer supports whitelist and blacklist regex for topics. That is most likely what you are looking for.

I use KCC (kafka-console-consumer) on command and pipe input to ingest Kafka data. KCC supports --whitelist and --blacklist flags.


(Ramky) #4

Tinle,
Thank you for your response. I will try your suggestion.


(Ramky) #5

Mark,
Thanks for your suggestion.


(system) #6