Duplicates in ElasticSearch when using ILM Rollover

Troy571 · June 22, 2022, 6:18pm

I have read two other posts related to this but I still have not found a solution to my issue. I am ingesting large amounts of data and the ingestion rate is unpredictable. I have implemented ILM policies to rollover an index when it reaches a certain size. This is to avoid having a very large index and a bunch of smaller ones and avoiding hot shards (overall even data load). The two issues I am having are:

Upon ingesting a large dataset that contains duplicates, I ingest a document and extract entities and write this data to ES. The ingestion continues, after a certain amount of data the index rolls over. Now the duplicate document appears from above in the ingestion but ES does not realize it is a duplicate because the write index does not know about the previous (now) read only index. Is there any way to solve this without doing a read on the alias of the index and then writing? This is not valid solution because it will incur too much overhead and I cannot use the bulk api ingestion (I believe).
Updating a document that does not exist on the write index. Is there any other solution besides querying the alias to find the document that needs to be updated?

Thanks for any help/suggestions.

warkolm · June 23, 2022, 2:14am

Welcome to our community!

For both of your question, not with ILM, no.

system · July 21, 2022, 2:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Duplicate document ids when using rolling indices Elasticsearch	1	1066	April 22, 2022
Duplicates after index rollover Elasticsearch ilm-index-lifecycle-management	17	773	October 26, 2023
Remove/Prevent duplicates with rollover Elasticsearch	15	742	August 24, 2022
ILM with rollovers breaking data integrity Elasticsearch ilm-index-lifecycle-management	15	48	October 26, 2024
Index Lifecycle Management with document ids / routing Elasticsearch ilm-index-lifecycle-management	3	568	April 29, 2020

Duplicates in ElasticSearch when using ILM Rollover

Related topics