Regarding Deduplication

dawiro · May 8, 2019, 8:07am

Hi,
I'm looking at avoiding duplicated entries when indexing logs indexed to elasticsearch from filebeat via logstash. To do that I'll be using the logstash fingerprint module...

I'd like to use more than the message field to ensure uniqueness of generated 'ids' without using the @timestamp field sent by filebeat. Given that, could I use offset alongside message to ensure unique id generation?

Regards,
D

pmercado · May 8, 2019, 9:12am

Hi @dawiro

offset + message + some datetime range should probably do it

without some datetime range there would be a chance of removing valid logs if you are harvesting a rotating log that will repeat the offset, and might re-send the same message at the same offset at a different time

that said, afaik you shouldn't be getting duplicates from filebeat unless there are problems ACKing from the output to filebeat, or some other cases at edgy scenarios. If you are receiving such amount of dupes that you need to filter them, probably there is an originating issue behind it we should be considering.

dawiro · May 8, 2019, 9:22am

Hi @pmercado,

The problem with keying on timestamp is that I can't be sure filebeat isn't generating a timestamp for some portion of logs on transmission. So, in scenarios where filebeat resends logs that would be a problem.

Regarding duplicate transmits from filebeat we have seen retransmits where the logstash input flaps as the pipeline to elasticsearch blocks and becomes available again in rapid succession. It's an edge case I'm trying to account for.

Regards,
D

system · June 5, 2019, 11:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Duplicated date in my elastic Logstash	6	315	November 1, 2022
Duplicate events with filebeat -> logstash -> elasticsearch pipeline Logstash	6	2353	November 28, 2017
Duplication in Filebeat to Elasticsearch data pushing Beats filebeat	5	702	December 28, 2017
Detect filebeat retries to remove duplicates in the server side Beats filebeat	3	1912	July 5, 2017
Filebeat, Logstash, Elasticsearch robustness and duplicated documents Beats filebeat	11	4272	July 5, 2017

Regarding Deduplication

Related topics