How to handle duplicate records in datastreams using fingerprint

venkatkumar229 · December 26, 2025, 9:49am

Hi Team,

I am ingesting data from a Kafka topic into Elasticsearch using Logstash.
The incoming data can contain duplicates, so I am using a fingerprint filter on a unique business field (seqId) and setting it as the document _id.

This works correctly within a single backing index — duplicates are not created as long as the data goes into the same index.

However, once the data stream rolls over to a new backing index, I start seeing duplicate documents again, even though the _id generated from the fingerprint remains the same.

Setup details:

Ingesting data using Logstash → Elasticsearch data stream

Using fingerprint filter:

fingerprint {
  source => ["seqId"]
  target => "[@metadata][generated_id]"
}

Using this in the output:

document_id => "%{[@metadata][generated_id]}"

Data stream rollover is based on time
Same seqId can arrive again after rollover

Thanks In Advance.

Tortoise · December 26, 2025, 11:12am

Hello @venkatkumar229

Data streams are designed for append-only time-series data so they are not suitable when you need global de-duplication based on _id.
_id uniqueness is enforced only within a single backing index so after rollover the same _id can be indexed again in a new backing index (the same behavior seen in your case)

The same thing happens with regular indices using rollover and a write alias because Elasticsearch does not check older indices for existing _ids.

If you require exactly one document per business key (for example seqId), the recommended approach is to use a Transform or will have to avoid rollover of index (which might not be a feasible solution).

Topic		Replies	Views
Fingerprint processor allowing duplicates Logstash	5	246	September 18, 2024
Prevent duplicates in a data stream Elasticsearch datastreams	6	4685	September 30, 2021
Configuring Pipeline To Handle Duplicates In Rollover Indices Logstash	3	1047	September 20, 2019
Avoid duplication Logstash	13	5281	December 7, 2018
Reindex data is multiplying docs Logstash	14	636	March 16, 2022

How to handle duplicate records in datastreams using fingerprint

Setup details:

Related topics