TTL for documents

(Sending to new mailing list)

+1 to this feature. It will help in my scenario.

I have docs in CouchDB which have publication and expiry timestamps in them.
I expose Elasticsearch as a query layer to users for these docs.
I have a celery (python) job which keeps syncing (POST/DELETE) these docs to
Elasticsearch at appropriate times.

Typically there is a 2 - 5 minute delay in these operations ( pubish_time +
x_minutes ). It is OK for me to publish the doc to Elasticsearch with a
delay, but withdrawing with a delay is a bit painful.

If a _ttl field is supported, it will make withdrawing docs easier and
almost realtime.

Regards,
Mahendra

On Wed, Jul 27, 2011 at 9:44 AM, Shay Banon shay.banon@elasticsearch.comwrote:

Heya,

Yes, its possible to add this feature. I think there is already an issue
open for something similar... . Would love to hear what other people
think...

-shay.banon

On Wed, Jul 27, 2011 at 12:48 AM, Benjamin Devèze <
benjamin.deveze@gmail.com> wrote:

A lot of documents naturally come with an expiration date. I think it
would be nice to have a built-in support for a TTL/doc (with
eventually default TTLs configurable per types/indices). I know disks
are not expensive these days but it still is a common usage to use TTL
for documents and it can be a very useful feature especially for
people using ES as a key value storage. It is a pain to let the user
trigger regularly some delete by query jobs to purge the data and I
think it is a common enough use case to include it in the core of ES.

Concerning the implementation of this feature I propose to introduce a
special _ttl field. When searching documents ES could hide expired
ones (maybe adding a not query for expired docs in each query or
something smarter). The documents could be really deleted and disk
space liberated during segments merges.

What do you think?

--
Mahendra

http://twitter.com/mahendra