Reduce (shrink) primary shards on older data indices of a data stream

I have a data stream with page views of multiple websites of customers. I allow customers to delete their data (this is important). When they do I remove all the documents with a certain hostname.

At the same time, I want to keep my search of recent data fast. What works best is having 12 primary shards of recent data. Older data is fine with 3 shards (both on a 3 node cluster).

The most recent data (not older than 1 day) will also be updated with a few extra metrics we collect at the end of the page views (like time on page).

A few options (and their concerns):

  • The shrink API but it is creating read-only indices (I need to be able to delete old but specific data)
  • Make the data stream writable once every month and run the deletions then (is that possible?)
  • Reindex the whole data stream every month to remove the older documents (how to keep writes and updates in the last month working well?)

What would be a way to make the recent data fast in a data stream that sometimes gets updates (in very old data).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.