I need to build scalable Kafka consumer cluster, which reads data from dynamically added Kafka topics and bulk index the data to Elastic-search. At present,i developed a java consumer client jar which reads data from given list of Kafka topics and feed them to ES cluster using bulk indexing. I also tried using Spark, but performance is 60% lower than running jar from command line.
I will have new topics added to Kafka cluster on fly. So i need to develop a scalable cluster for consuming various topics of Kafka data and feed the data to ES.
Please suggest me the better way to build a scalable Kafka Consumer and bulk index the data.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.