Data expiration and ttl

ycunc · January 9, 2016, 8:02pm

Given a stream of data coming in every second, but we only want to keep data in the past T time (say 1 hour). What is the best way to expire and remove old data? We did some research and found the following two

Set ttl of each document to T, and ES will automatically black list old data and remove them. One question we have is when and how frequently will the data be physically removed? Is it controlled by indices.ttl.interval or something else?
Use time-frame based indexes, and index data every T time frame. However, this approach might introduce very strange tfidf scores for the latest index when it has very few data. Is there a good way to handle this?

Thanks!

warkolm · January 9, 2016, 9:26pm

TTL is deprecated and will be removed in upcoming versions.

So you should definitely use time based indices instead.

ycunc · January 9, 2016, 9:48pm

Thanks for the reply. However, for time based indices, is there any helper functions for this? We are concern about a new index with very few data in it. It might have very different tfidf values, and might introduce strange search results. Is there any good way to handle these?

Thanks!

warkolm · January 10, 2016, 1:22am

Not really, it might be worth looking into just deleting documents directly?
Or perhaps someone else has other ideas.

ycunc · January 10, 2016, 5:34am

Yes directly issuing a request to delete data is also fine. However, what's the difference between ttl and directly delete? They both will only blacklist deleted items and then remove them during segment merging? any particular reason in favor of directly deleting? or just because ttl is getting deprecated so we prefer delete?

warkolm · January 10, 2016, 9:02am

TTL means constantly scanning the entire index looking for documents to be deleted, which is expensive.

Topic		Replies	Views
Best practice for expiring data in v6 Elasticsearch	3	800	December 26, 2017
We want to automatically delete some document with different expiration dates Elasticsearch	8	600	August 27, 2020
TTL Documents Elasticsearch	9	5374	February 4, 2019
Difference between expired and deleted document? Elasticsearch	3	355	July 6, 2017
Any alternative for _ttl mapping Elasticsearch	3	1831	July 5, 2017

Data expiration and ttl

Related topics