We have a requirement for all the documents stored in single Index to be persisted based the data value in particular field. So few document with field value=XYZ needs to be persisted 7 days only while other documents to persisted for 1year or so. As we understand, elastic search does have capability to set TTL for each document to be set through Java API. But we also read through various expert blogs of experts that TTL is not a performance friendly solution since ES continually check documents for expiration in the background causing performance bottleneck to some extent. Setting TTL based on Index/Type would also be the same provided data is split into independent types based on persistence requirement.
So in such situation, what is the best possible solution we should go for as per best practices/standards?
Don't use _ttl at all. It is deprecated and being removed eventually because it doesn't perform well. Mostly because deleting all documents in an index is just much less efficient than deleting the index. Instead I'd have an index per day for each of the last seven days. Every day you nuke the oldest one and make a new one (or turn on dynamic index creation and use templates).
For your one year documents you can do an index per day, but searching that many indexes can be quite a bit. So it might make more sense to do an index per month or per week.
Also have a look at the shrink and rollover APIs. They are new in 5.0 so we don't really have any best practices around them but they were meant for time based indexes like the ones you are describing. They may not help you, but they are worth looking at.
Thank you very much for detail explanation. But we have data which needs to be represented through single index pattern inside Kibana visualizations. Hence we at most can split data in multiple types under same index. So in this case, if not TTL then is there any better way to maintain various types based on different persistence requirements for each type under single index?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.