TTL gone in 5.0 -OMG

Just set up ES for a new client to discover TTL is gone. OMG why? I had this working fine for 200m docs per day on a 3 mode cluster.. it is/was one of teh best features of ES, especially if yoou dont use Log based indexes, but single indexes per 'job'. Even if this was configureable to hurly/daily/monthly to stop scanning, it now means we have to write a piece of code per index, to scan down and delete by query.

Shame.....any chance of bringing a version of it back?

TTL is not coming back in the foreseeable future. Probably not ever.

While your user experience may have been acceptable, it was not so for every user. The effect on the underlying Lucene structures is not ideal, as it drastically affects tiered merging of segments (under the hood — you never see it as a user unless you know how to look for it), to say nothing of the increased I/O load required to perform it. Additionally, some users were using it in lieu of better practices (like using time-series indices) which their use cases should have been using.

Another reason TTL was removed is that its default behavior was to scan all of the documents in the specified index every 60 seconds to see if the TTL expired. For larger indices, the overhead of this became astronomical. It really was a suboptimal solution for the desired outcome, even if you feel it worked well for your use case.

There are still other reasons, which you can read about in GitHub and perhaps in some of the release notes/blogs.

As you mentioned, the workaround to not having TTL (but wanting it) is to use the delete_by_query plugin instead. Yes, it requires scheduling on your part.

You'd probably do just as well—if not better—by using aliases to associate a series of time-series indices (making them behave as one index for querying purposes), and drop indices after they exceed your desired retention period. As far as keeping things tied down to single indices per 'job', have you investigated using the rollover API? It could feature in to a scheme like this very nicely.

3 Likes

Hi Aaron,

Thanks for the reply. I kindda guessed that may be the reason, but still feel it's a huge loss to a well designed system. Did elastico consider any of the below?

  1. Blocking ttl unless there was a @timestamp, or @date field? This would ensure a time series index.

  2. Putting a default , cron like , interval, but changeable, on a ttl...tge default could even be something dumb like monthly or yearly.

I can see that a badly created index, with a ttl, especially in a cloud environment, would be a recipe for all kinds of hell breaking out, and therefore suboptimal. But as most indexes, initially, will be log based, they would typically be time series in nature, so should conform to option 2.
This way, especially in a cloud environment, where hardware could be sharing indeces from different clients, I suspect there are a multitude of well and badly written cross running over the same Es nodes, which could also unleash the Sysadmin nightmare.

I note your comment in the rollover api, which will not really help any if my use cases...as I tend to use 1 index per use case, as opposed to what I walked into with a current client where they have 28k indexes and one alias....never drooping the old ones (please don't ask...I don't know either).
The delete-by-query I hadn't realised had gone as well (3 things gone then if you include Kibana 3, which was brilliant to spin things up quickly). Luckily it looks like AWS have included that in their service, as I have been gaily doing that (and it worked), without realising it had gone ....lucky eh?

Finally, I notice that Solr has TTL, working line my suggestion 2 above. https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/DocExpirationUpdateProcessorFactory.html
So, the question is open again...if apache are happy....why can't ES do likewise...but with heavy constraints obviously.

Thanks again.....I still love ES for other reasons...just every time I dip back in I seem to lose some functionality that I use :slight_smile:

Peter Colclough
T:01963 220217
e: biton@compuserve.com
Skype: peter.colclough

Delete-by-query isn't gone. It's a plugin now. AWS probably provides it automatically.

Yes, I saw that....sorry didn't make myself clear. Still seems a shame about ttl...but then seems ES is veering towards multiple indices and aliases, as opposed to one index per job. Both have their use cases, and I use both...but it's a real downer.
Hey ho...onward and upward :slight_smile:

Peter Colclough
T:01963 220217
e: biton@compuserve.com
Skype: peter.colclough

Appears that the AWS service has the '_delete_by_query' API installed in 5.3 . Strangely (and I have only tried this for teh last hour) , it works through their K4 interface, but not through teh curl commands from a different node... will track tat down. They don't have the delete-by-query plugin installed yet.

So at least I am relatively happier :wink: Just thought I would put an update here.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.