In your use case, could the retention policy change for 89% document?
If not, I would create one index for documents which could have a moving retention policy and use _ttl. For monthly docs, I would use an index per month.
If it's not the case, I think you should deal with _ttl with a cost of higher merges.
My 2 cents.
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 1 avr. 2014 à 08:03, slushi kireetreddy@gmail.com a écrit :
yes, unfortunately it’s not completely known at index time. I would need to keep the separate indices in sync when a retention policy change occurs. attempting this seems like it could open up a whole can of worms.
On Tuesday, April 1, 2014 1:58:04 AM UTC-4, David Pilato wrote:
If you know in advance which doc should be removed (i mean at index time), you should send the document to an index which should be entirely removed after a given period.
Makes sense?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 1 avr. 2014 à 00:00, slushi kiree...@gmail.com a écrit :
I attended an Elasticsearch meet up and at some point it was mentioned that TTL use is discouraged, but yes this would make a lot of sense here. Also the 1 year thing is really a guesstimate, we want to keep as much of that data as possible. I guess maybe with TTL you may not have as much control when the document deletion and possible segment merging? I am not that familiar with Elasticsearch performance stuff yet (we just started looking into using ES).
On Monday, March 31, 2014 5:52:28 PM UTC-4, Kevin Wang wrote:
Why not use TTL for document? Elasticsearch Platform — Find real-time answers at scale | Elastic
On Tuesday, April 1, 2014 8:50:14 AM UTC+11, slushi wrote:
I have varying data retention requirements I am trying to balance (I am continuously indexing new documents):
1% of my documents need to be kept forever
10% need to be kept 1 year
the remainder needs to be kept for 1 month
I can easily set properties indicating the retention policy for each document and then periodically do a "delete by query". However, since the delete would remove 89% of the indexed documents, would there be any potential performance problems with this straightforward approach? I guess this is a YMMV type thing, but I was just wondering what the typical approach is here. Would it be necessary to perhaps filter the query to not affect so many documents at once? Would query performance be greatly impacted?
The alternate approach I was thinking would be to create separate indices for each retention type. Cleanup would be easier, but unfortunately a document's retention policy can be upgraded/downgraded so that could be a little messy to keep consistent.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b685cff-e956-473a-935e-9546b2ea59b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eec089d7-0cef-4a9b-b53f-7dce55ad2bfd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/80DF8D6F-E0E8-46F3-BA7D-0D76D1B11E45%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.