Per user data retention

10aRK · November 16, 2020, 2:10pm

Hi

I've been looking into how best to model data whilst accounting for customisable data retention periods at a user / document level vs the whole index.

ILM seems perfect when reasoning about the whole index, old versions of elasticsearch appear to have had TTLs but there doesn't appear to be any guidance / best practices on how best to approach scenarios when needing finer grained policies.

The options appear to be:

Create indexes per user + time window for most control but could end up with lots of small indexes.
Try to record multiple users with the same retention period into an index for the retention period + time window, but may be less flexible in terms of changing retention periods and could still end up with small or very large indexes over time.
Record all normally in indexes per time window and having a separate cleanup process to find and remove the relevant data once expired.

Any insight on how others may be approaching this? or other options solutions i may not have considered would be appreciated.

As for thinking about the data, can presume it to be a time series of user generated data where each user generating data can have a custom retention period defined in days upto a year, a growing number of users around the 10s of thousands range, with some users inevitably generating more data and overall totals of a few million records a day.

Thanks

warkolm · November 17, 2020, 12:18am

How many users are we talking about here?

10aRK · November 17, 2020, 1:17am

At the moment around 5k

warkolm · November 17, 2020, 2:00am

Can you tier the retention so that you can then group users in indices?

10aRK · November 17, 2020, 11:25am

That's one of the 3 options i noted, i presume there aren't other options i'm missing?
In the near term it'd likely keep things in check. though I can imagine complexities arising from changing a given users retention period or retention buckets increasing from accommodating up to 1 year to say 2-3 years.

The thought on my mind, if over time the problem morphs into needing to create an external housekeeping process, perhaps better to go down that path from the start and forget about grouping users by retention periods.

system · December 15, 2020, 11:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to manage data retention time in ElasticSearch Elasticsearch	2	224	December 28, 2022
Best practices for dynamic expiration Logstash	2	248	August 16, 2022
Index Retention by Filesize Elasticsearch ilm-index-lifecycle-management	4	517	June 13, 2023
Retention policy characteristics Elasticsearch	7	198	October 26, 2023
Can I have one Index per User? Elasticsearch ilm-index-lifecycle-management	2	1551	December 23, 2019

Per user data retention

Related topics