Detect filebeat retries to remove duplicates in the server side

palmerabollo · April 19, 2016, 7:17am

I'm using filebeat to send logs to a remote logstash endpoint.

Is there a flag or any way to detect if a log has already been sent? That is, detect if it is a retry.

I think filebeat adds a @timestamp field (I could use that in combination with a hash(log)) but it changes in every retry, if I'm not wrong.

Thank you.

Christian_Dahlqvist · April 19, 2016, 7:38am

If it retries, it is generally not clear whether the initial record reached the destination or not. Filebeat can provide additional metadata around the event, e.g. filename and offset in file, that you could use rather than the timestamp.

steffens · April 19, 2016, 12:35pm

filename, offset + beat (shipper name) are a good source for deduplication. One trick (when sending to logstash) is to build an id based on these fields and just re-index the document. I think re-indexing in elasticsearch will mark the old entry deleted and create a new one (right, takes some disk space + CPU usage, but on compaction deleted entries are finally removed from disk). It's a very simple trick to implement deduplication.

Topic		Replies	Views
Deduplication in beats while sending logs to Logstash or Elasticsearch Beats	1	539	November 7, 2019
Moving from Logstash to Filebeat => no duplicate log Beats filebeat	3	1302	January 4, 2017
Filebeat, Logstash, Elasticsearch robustness and duplicated documents Beats filebeat	11	4330	July 5, 2017
Filebeat sending duplicate logs to logstash Beats	2	2406	March 2, 2018
Filebeat + elasticsearch make duplicates events Beats filebeat	20	3277	December 4, 2018

Detect filebeat retries to remove duplicates in the server side

Related topics