Kafka -> ES/HDFS with Logstash or Kafka Stream/Connect

Hi,

I use Kafka for message queue/processing. My question is about performance/best practice. I will do my own performance tests but maybe someone has results/experience already.

The data is raw in a Kafka (0.10) topic and I want to transfer it structured to ES and HDFS.

Now I see 2 possibilities:

  • Logstash (Kafka input plugin, grok filter (parsing), ES/webhdfs output plugin)
  • Kafka Stream (parsing), Kafka Connect (ES sink, HDFS sink)

Without any tests I would say that the second option is better/cleaner and more reliable?

EDIT: Moved to stackoverflow: http://stackoverflow.com/questions/40379831/kafka-to-elasticsearch-hdfs-with-logstash-or-kafka-stream-connect

Try asking this question in Stack Overflow where people with Kafka Streaming experience could offer some of it. I doubt many people here could offer opinion on both sides of your comparison.