Reduce (shrink) primary shards on older data indices of a data stream

adriaan · January 11, 2022, 2:27am

I have a data stream with page views of multiple websites of customers. I allow customers to delete their data (this is important). When they do I remove all the documents with a certain hostname.

At the same time, I want to keep my search of recent data fast. What works best is having 12 primary shards of recent data. Older data is fine with 3 shards (both on a 3 node cluster).

The most recent data (not older than 1 day) will also be updated with a few extra metrics we collect at the end of the page views (like time on page).

A few options (and their concerns):

The shrink API but it is creating read-only indices (I need to be able to delete old but specific data)
Make the data stream writable once every month and run the deletions then (is that possible?)
Reindex the whole data stream every month to remove the older documents (how to keep writes and updates in the last month working well?)

What would be a way to make the recent data fast in a data stream that sometimes gets updates (in very old data).

system · February 8, 2022, 2:28am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to re configure a data stream for lowering the usage of shards Elasticsearch ilm-index-lifecycle-management , datastreams	2	777	December 16, 2021
What's the best way to use Shrink Indicies API? Elasticsearch	2	1162	February 13, 2017
Reduce number of shard Elasticsearch	3	85	May 6, 2024
Increase number of shards for an existing data stream Elasticsearch	1	87	February 27, 2024
Datastream speed Elasticsearch datastreams	6	388	July 15, 2022

Reduce (shrink) primary shards on older data indices of a data stream

Related topics