Same _id ends up duplicated across rollover indices behind a write alias — can this be prevented via template/ILM?

juan_ma_tejada · March 4, 2026, 7:19pm

Hi all,

I’m indexing documents into Elasticsearch using a deterministic _id (SHA1 of email + normalized_context). I write to an alias that uses ILM rollover, so over time it creates backing indices like:

data-000073
data-000077

Problem: I’m seeing the same _id stored twice, but in different backing indices (e.g., one copy in data-000073 and another in data-000077). When I index again through the alias, the document is written to the current write index, so it doesn’t overwrite the older copy if it exists in a previous rolled-over index.

Questions:

Is there any way (index template / ILM / alias setting) to enforce uniqueness of _id across all indices behind an alias (or a data stream), so that indexing via the alias overwrites the existing document even if it lives in an older backing index?
If not possible, what’s the recommended approach to avoid disk growth from duplicates while still using rollover?
(e.g., routing to fixed “bucket” indices based on hash prefix, periodic reindex+dedupe, or another pattern)

Any pointers or best practices would be appreciated.

Example of _id generation:

python

hash_input = f"{email}{email_context_str}"

doc_id = hashlib.sha1(hash_input.encode()).hexdigest()

                                document = {
                                    "_index": INDEX_NAME,
                                    "_id": doc_id,
                                    "_source": {
                                        "email": email,

Thanks!

Christian_Dahlqvist · March 4, 2026, 7:34pm

No, this is not possible.

Why are you getting duplicates? Would it be possible to delete them at the source?

Are these exact complete duplicates? If they are, does each document have an event timestamp?

If you want to avoid duplicates, have a unique ID and a consistent timestamp you can use traditional timebased indices instead of rollover. When doing this each index covers a specific time period, e.g. a day or a month, and has a timestamp as part of the name to indicate this. Events are directed to the index that matches the event timestamp, so all events with the same ID and timestamp will go to the same index, resulting in updates if duplicates are received. You can use these kind of indices with ILM.

Topic		Replies	Views
Duplicates after index rollover Elasticsearch ilm-index-lifecycle-management	16	1159	September 28, 2023
ILM with rollovers breaking data integrity Elasticsearch ilm-index-lifecycle-management	16	217	January 8, 2025
_id is not unique. is repeated in indexes addressed by a single alias Elasticsearch	4	1339	December 16, 2022
Unique id for multiple indices having same alias Elasticsearch	1	895	July 5, 2022
Duplicates in ElasticSearch when using ILM Rollover Elasticsearch ilm-index-lifecycle-management	1	430	June 23, 2022

Same _id ends up duplicated across rollover indices behind a write alias — can this be prevented via template/ILM?

Related topics