Per user data retention


I've been looking into how best to model data whilst accounting for customisable data retention periods at a user / document level vs the whole index.

ILM seems perfect when reasoning about the whole index, old versions of elasticsearch appear to have had TTLs but there doesn't appear to be any guidance / best practices on how best to approach scenarios when needing finer grained policies.

The options appear to be:

  • Create indexes per user + time window for most control but could end up with lots of small indexes.
  • Try to record multiple users with the same retention period into an index for the retention period + time window, but may be less flexible in terms of changing retention periods and could still end up with small or very large indexes over time.
  • Record all normally in indexes per time window and having a separate cleanup process to find and remove the relevant data once expired.

Any insight on how others may be approaching this? or other options solutions i may not have considered would be appreciated.

As for thinking about the data, can presume it to be a time series of user generated data where each user generating data can have a custom retention period defined in days upto a year, a growing number of users around the 10s of thousands range, with some users inevitably generating more data and overall totals of a few million records a day.


How many users are we talking about here?

At the moment around 5k

Can you tier the retention so that you can then group users in indices?

That's one of the 3 options i noted, i presume there aren't other options i'm missing?
In the near term it'd likely keep things in check. though I can imagine complexities arising from changing a given users retention period or retention buckets increasing from accommodating up to 1 year to say 2-3 years.

The thought on my mind, if over time the problem morphs into needing to create an external housekeeping process, perhaps better to go down that path from the start and forget about grouping users by retention periods.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.