Currently we have the following setup:
Filebeat reads log files and sends the content to Kafka. One log-line results in one Kafka event.
At the other side a Logstash reads the events from Kafka, parses them and sends the resulting document to Elasticsearch.
Filebeat runs inside a Docker container and reads the log files from a Docker data container.
We are using Filebeat 5.1.2 with Docker 1.11.2 .
The problem we are encountering is as follows:
Although the offset in the Filebeat registry is pointing to the start of a new log-line, upon restart Filebeat starts reading somewhere in the middle of the previous log-line, resulting in sending a partial log-line to Kafka and thus having parsing errors in Logstash.
According to the documentation on "How does Filebeat ensure At-Least-Once delivery ?" Filebeat should just start reading from the offset upon restart.
What could cause the behavior we are experiencing ?
First thing that pops to my mind is that this could have to do with encoding or special chars. But at the same time offset is in bytes, so it should not matter.
Can you share an example of the log files and registry where this is happen? Can you share your config?
If it was an encoding problem, then we should have seen scrambled content in Kafka, I think. And as long as Filebeat was not stopped, the pipeline processed everything well.
Eventually we found a solution by using close_eof:true, which works for us as the log files don't roll. Each file is written once and not updated over time.
With that option turned on, restarting the docker container of Filebeat does not result in sending partial log lines to Kafka any more.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.