Newbee so maybe strange question. In my enviroments Kafka is used and the main reason for this is the ability to temporaly queue data. This functionality is (now) available in logstash.
So simple question, can we remove kafka?
I understand that you can also look at this the “micro service way” and still favor a spit in manipulation fase and a data director fase, but a container/logstash/volume looks to me the optimum solution?
if you are running a large enterprise data ingestion/aggregation system, then an Event Bus like Kafka is highly recommended. Some reasons include
the downstream systems like Logstash/elastic or other 3rd party system may need restarting or updates frequently. This means you may loose data from UDP/TCP/streaming systems
Event Bus (kafka) acts as buffering layer and smoothes the data into Elastic or downstream systems. The "velocity" & "veracity" of data is made much better using Event Bus
Other systems can get the data from Kafka without bothering your team
lot other reasons too
Overall it depends on the design/architecture of your platform and how much data resiliency you require
The big difference is that the persistent queue functionality in Logstash is specific to each node and does not run in clustered mode. If you lose a node you may therefore lose data. Kafka however supports running in clustered node and losing a node does generally not lead to data loss, and is therefore generally more resilient.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.