Detect filebeat retries to remove duplicates in the server side

(Guido) #1

I'm using filebeat to send logs to a remote logstash endpoint.

Is there a flag or any way to detect if a log has already been sent? That is, detect if it is a retry.

I think filebeat adds a @timestamp field (I could use that in combination with a hash(log)) but it changes in every retry, if I'm not wrong.

(Christian Dahlqvist) #2

If it retries, it is generally not clear whether the initial record reached the destination or not. Filebeat can provide additional metadata around the event, e.g. filename and offset in file, that you could use rather than the timestamp.

(Steffen Siering) #3

filename, offset + beat (shipper name) are a good source for deduplication. One trick (when sending to logstash) is to build an id based on these fields and just re-index the document. I think re-indexing in elasticsearch will mark the old entry deleted and create a new one (right, takes some disk space + CPU usage, but on compaction deleted entries are finally removed from disk). It's a very simple trick to implement deduplication.

