Deletion of data within frozen tier

Hello,
I realize this question goes a bit counter to the usage of a frozen tier, but I'll outline our use-case.

We have an ILM policy of 10 days hot, 80 days frozen. Some of our customers have the need to remove data we store (think data removal requests). For the default case, our lifecycle policy works fine, data older than 90 days total is just deleted. However, if a customer wants us to remove specific data, such as documents where some_field = $some_id, It's unclear to me if its possible to delete a subset of data within my frozen tier.

One Idea I'm considering / trying is the following:

  1. One at a time, mount snapshots containing the data stream I need to delete data from. It seems I cannot determine which specific snapshots may contain documents I care about.
  2. Create a temporary index that is writeable
  3. Re-index, the read-only index to the writable index, using a query to exclude the documents I wish to remove.
  4. Snapshot the new index
  5. Unmount the old snapshot, and delete the snapshot.

This seems quite intensive and I'm hoping there is a better solution. Worst case we would do this as a daily operation in a bulk fashion.

Thank you!

Hi,

An alternative approach could be to use the "delete by query" API on the hot or warm tiers before the data is moved to the frozen tier. This would allow you to delete specific documents based on a query. However, this would require you to identify and delete the data before it is moved to the frozen tier by the ILM policy.

Regards

Hi there,
This idea may actually work for us given some new requirements that change the situation. Do I need to do anything special in my _delete_by_query call to explicitly target only the hot tier? ie: would the query fail targeted against the data stream since the stream maps to an ILM policy with hot and frozen tiers?

Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.