I would like to move from logstash as a log shipper, to filebeat.
I'm using the logstash file input plugin to collect logs, same thing with filebeat, to send everything to a centralized logstash-shipper before writing to elasticsearch.
The thing is, if I shutdown logstash and start a fresh filebeat instance instead, filebeat will start from the beginning of the file, leading to duplicate logs in Elasticsearch.
I could have add a "log content hash based" on logstash-shipper side in elasticsearch document_id to avoid duplicates, but I have to admit I'd like an easier solution.
Would you have any idea on how to "bootstrap" a filebeat instance with logstash file cursor maybe ?
I never tried it but I think it should be possible to write a small script in your preferred language that takes the sincedb from LS and converts it into a filebeat registry file. An alternative is using tail_files in filebeat, but if during shutdown LS and boot up Filebeat log lines were added, these are lost.
Other solution could be, that you write filebeat logs to a different index and then manually check (based on the timestamp?) what the time range of the duplicated events is and then use delete_by_query to remove these from of the two indices. That would mean running both for a certain. This is also what I would recommend.
Maybe loosing some logs using tail_files will be acceptable for my client.
Converting the sincedb into a filebeat registry file seems ok to me, I'll look into it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.