I use Kafka for message queue/processing. My question is about performance/best practice. I will do my own performance tests but maybe someone has results/experience already.
The data is raw in a Kafka (0.10) topic and I want to transfer it structured to ES and HDFS.
Now I see 2 possibilities:
- Logstash (Kafka input plugin, grok filter (parsing), ES/webhdfs output plugin)
- Kafka Stream (parsing), Kafka Connect (ES sink, HDFS sink)
Without any tests I would say that the second option is better/cleaner and more reliable?
EDIT: Moved to stackoverflow: http://stackoverflow.com/questions/40379831/kafka-to-elasticsearch-hdfs-with-logstash-or-kafka-stream-connect