We have a use-case where we're planning to leverage Elastic Search to store application session data. And there are a few user events which UI is not able to capture programmetically so we can't fully rely on them to issue a DELETE request to delete the session.
So we were thinking to delete all the documents from specified "type" in an index nightly based on ceationdatetime. How can we do it efficiently? Any response in this context will be highly appreciated.
Doing a delete_by_query is possible, but it is suboptimal. I would recommend doing time-series indices per type and then deleting the entire index when you don't need it anymore. You could use aliases to make sure that current indices are always referenced in the same query.
Why?
Having multiple mapping types per index is no longer allowed in 6.0.
Deleting a large number of documents requires a large number of atomic operations, which has a huge performance and disk I/O overhead, whereas deleting an index is basically a single operation.
Thanks for your response. We're still running v 2.4.1 in our environment. Could you please provide more details on time series indices ?
Moreover, even if we don't leverage an alias and delete the index every night using a script then next day when application will trigger a request to post session document in an index mentioned in URL, will index not get created automatically ?
That is completely dependent on how you set things up. For example, Logstash and beats direct timestamped documents to a corresponding index name, e.g. logstash-2018.03.22. They don't create the index, but rather say, "put this document into the index named x."
If you were to upgrade to 5.x or higher, you could use the rollover API, then you would create an index with a numeric suffix, e.g. index-000001, and associate it with an alias, e.g. my_alias:
PUT /index-000001
{
"aliases": {
"my_alias": {}
}
}
This will enable you to point your application (or Logstash or Beats) to index into my_alias, instead of the index name. When the currently pointed-at index meets one of 3 user-defined criteria—max_docs, max_age, max_size (max_size is only available in Elasticsearch versions 6.1+)—it will "roll over" the index, incrementing it, and pointing the alias to it, e.g. index-000002 would then be referenced by my_alias.
Of course, with you still in 2.4.1, the rollover API is not available to you yet. At this point, you should probably aim for daily or weekly indices.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.