Index Lifecycle Management with document ids / routing

Currently, we have a logging pipeline based on ELK (plus Filebeat). For quite some time we've been doing sharding based on a fixed number of shards, usually aiming to have ~40GB per shard. We're currently looking into automating this by using size-based sharding through Index Lifecycle Policies.

In our current setup, we set custom document ids based on some field(s) that we extract from the events. The reasoning behind this was to be able to "replay" data the data ingestion (from Kafka in our case) and be able to re-ingest some data or fill any gaps if we had an outage.

If we move into a size based sharding we lose the ability to reply traffic, because even if we continue to set the document id field it could be (potentially) ingested into different indices, which means that the old version of the document will not be overwritten resulting in duplicated results. We though on potentially using the routing parameter but that would result in the same situation since both the document id and routing parameter are based on the underline index behind the write alias (as far as I can tell).

In general: I'm wondering if there is any recommended way of applying an Index Lifecycle Policy (size based in particular) for indices that have custom document ids that would avoid having duplicated events in case that we need to re-ingest some old data.

If anyone can share any tips on a similar setup it would also be greatly appreciated :wink:

I do not think there is any way fully eliminate duplicates through IDs when using ILM/rollover, although the longer the duration an index covers compared to the delay in late of duplicate events helps reduce the probability.

Thanks for the reply @Christian_Dahlqvist!

That was my finding as well, I wanted to check if I was no missing on something. The issue is that we will not control the duration of an index (we're very much after the size based approach), and we have a few that will be rolled out several times per day. On a normal operation day we will not have issues, it will only be a problem if we need to re-ingest some data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.