Delete all documents from specified type in an index based on CreationDateTime

Sunny_Goel · February 26, 2018, 5:09pm

Hi,

We have a use-case where we're planning to leverage Elastic Search to store application session data. And there are a few user events which UI is not able to capture programmetically so we can't fully rely on them to issue a DELETE request to delete the session.

So we were thinking to delete all the documents from specified "type" in an index nightly based on ceationdatetime. How can we do it efficiently? Any response in this context will be highly appreciated.

theuntergeek · February 26, 2018, 9:36pm

Doing a delete_by_query is possible, but it is suboptimal. I would recommend doing time-series indices per type and then deleting the entire index when you don't need it anymore. You could use aliases to make sure that current indices are always referenced in the same query.

Why?

Having multiple mapping types per index is no longer allowed in 6.0.
Deleting a large number of documents requires a large number of atomic operations, which has a huge performance and disk I/O overhead, whereas deleting an index is basically a single operation.

Sunny_Goel · March 7, 2018, 5:18pm

Thanks for your response. We're still running v 2.4.1 in our environment. Could you please provide more details on time series indices ?

Moreover, even if we don't leverage an alias and delete the index every night using a script then next day when application will trigger a request to post session document in an index mentioned in URL, will index not get created automatically ?

theuntergeek · March 23, 2018, 4:47pm

That is completely dependent on how you set things up. For example, Logstash and beats direct timestamped documents to a corresponding index name, e.g. logstash-2018.03.22. They don't create the index, but rather say, "put this document into the index named x."

If you were to upgrade to 5.x or higher, you could use the rollover API, then you would create an index with a numeric suffix, e.g. index-000001, and associate it with an alias, e.g. my_alias:

PUT /index-000001 
{
  "aliases": {
    "my_alias": {}
  }
}

This will enable you to point your application (or Logstash or Beats) to index into my_alias, instead of the index name. When the currently pointed-at index meets one of 3 user-defined criteria—max_docs, max_age, max_size (max_size is only available in Elasticsearch versions 6.1+)—it will "roll over" the index, incrementing it, and pointing the alias to it, e.g. index-000002 would then be referenced by my_alias.

Of course, with you still in 2.4.1, the rollover API is not available to you yet. At this point, you should probably aim for daily or weekly indices.

system · April 20, 2018, 4:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Delete documents by timestamp Elasticsearch	18	22042	August 3, 2017
Time based elastic search index deletion Elasticsearch	2	1939	March 16, 2018
Deleting time based document -dummy Elasticsearch	5	692	July 5, 2017
Delete documents from index based on timestamp Elasticsearch	7	3511	February 8, 2017
Delete the data in Elasticsearch index based on a date/timestamp column in that index using python Kibana	5	3667	December 9, 2020

Delete all documents from specified type in an index based on CreationDateTime

Related topics