I tested the behaviour of filebeat kafka output when connection to kafka is lost and recovered. Events are correctly redelivered even if the connection is down for a few minutes, but I observe a single event (probably first one) to be lost for every connection loss. I'm not sure if I this is some misconfiguration from my side or a bug that should be reported to beats github.
I have a setup where I am watching the same log file with two independent log crawler instances each writing to separate kafka topic. For one of them I interrupted the connection by severing the ssh tunnel it is using, the other one was using a working connection all the time. Then I compared outputs of those and yes I could see that this single missing message was sucessfully passed via kafka to elasticsearch by the other crawler.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.