How do Searchable Snapshot snapshots get cleaned up?

Hello All,

I was curious if anyone knew the answer to the question:

How do searchable snapshot snapshots get cleaned up.

To explain the question a bit more, I'll use the below example:

I have:

  • A snapshot repository: repo_A
  • An SLM policy which take a global snapshot daily: slm_A
    • SLM policy points to repo_A
  • An ILM policy which uses content, hot, warm, cold (searchable snapshot), and delete: ilm_A
    • The cold phase points to repo_A
  • An index which is assigned to ilm_A: index_A

As index_A progresses through ilm_A it will eventually make its way to the cold phase. At which point ilm_A will make a clone (I think?) of the snapshot of index_A from slm_A's global snapshot.

This now leads to where my question is. At some point, index_A will reach the delete phase. Once here, ilm_A will now delete the local copy of index_A from the cluster. But what happens to the cloned snapshot for index_A that was made? Based on this, slm_A won't clean it up because:

Since a policy’s retention rules only apply to its snapshots, a policy won’t delete a snapshot created by another policy.

So, how do Searchable Snapshot snapshots get cleaned up? Does ilm_A do something to clean them up; will slm_A do something that isn't clear; or does the snapshot just never get cleaned up unless manually cleaned up?

I wasn't able to find anything in the docs that went over this behavior, but if I missed something please let me know.

ILM and SLM take separate snapshots, and each cleans its own ones up when they expire. The underlying data is shared through the regular snapshot-deduplication mechanism, but the snapshots themselves are logically separate and have independent lifespans.

Thanks @DavidTurner for the info. So, in the above case, ilm_A actually takes another snapshot rather than cloning slm_A's snapshot? And then when index_A hits the delete phase, ilm_A will both delete the local data, as well as the snapshot it took?

Correct. Although even if it did work by cloning a snapshot it would still be the case that the clone is logically separate with an independent lifespan.

Thanks for the clarity on the topic. A follow up question based on this info, is there a way to have the lifespan of a searchable snapshot snapshot differ from the life span of an index on the cluster via ILM?

If I wanted ilm_A to delete index_A from the local storage after 30 days, but wanted to keep the searchable snapshot snapshot for 90 days, is this possible?

The searchable_snapshot ILM action takes a snapshot, mounts it as a searchable snapshot, then deletes the original data from local storage. This happens when moving to the cold or frozen phases. So it sounds like you want to move your data to cold or frozen after 30 days and then move it to the delete phase at 90.

Somewhat, I want index_A to be moved to cold/mounted as a searchable snapshot. But I only want to keep index_A as a searchable snapshot (cold) for 30 days. After 30 days, I no longer want the index to be searchable on the cluster, but I want the snapshot that was used to still be around for another 90 days.

The use case for something like this is a bit weird, but essentially in this case the cluster can only realistically leverage cold, and not frozen (because of some infrastructure limitations). So, I want to be able to leverage cold/searchable snapshot for a period of time, but still have a copy of the data to re-mount for an additional period of time if required/requested.

I'm curious what those limitations are. IMO the simplest solution here is to use frozen (i.e. partially-mounted) searchable snapshots. Possibly instead of cold (i.e. fully-mounted) ones - it's pretty common to have a hot-and-frozen-only cluster.

If you really want to keep the data out of the cluster, just use normal snapshots, maybe using SLM to manage them. If ever you want to mount some old data, clone your chosen snapshot by hand and then mount the clone.

I'm curious what those limitations are.

There are 2 "limitations" on the infrastructure end.

  1. The connection between the cluster and the backup repository isn't that great, only 300-400Mbps, for a cluster that generates a few TB of data a day.
  2. The backup repository provider offers free egress which is why we use it, but they have a clause regarding egress.
    • You can't egress more data than you have stored in backups. While I don't think we'd hit this clause, I also don't want to get into a spot down the road where we do.

Overall the main issue with using frozen is:

The cache is cleared when a node is restarted.

If there was a way to have the above not happen, similar to cold, I think frozen would work.

(Overall I do agree that if possible, frozen is a more ideal solution.)

A bit of an update here:

I have found that I can leverage delete_searchable_snapshot to keep a snapshot after its index has been deleted by ILM. The issue I'm running into now, is I'm unable to find a way to detect this now "orphaned" snapshot once this happens.

It seems like the only place this information is ever stored is on the ILM side (Prevent deletion of snapshots that are mounted and ILM delete action · Issue #73947 · elastic/elasticsearch · GitHub), and looking over the Get Snapshot API it doesn't seem like any field tracks whether a snapshot is owned by ILM or SLM, this makes it a bit tricky to have something like a script come by and check for these "orphaned" snapshots.

Anyone have any ideas/thoughts here?

Note: I could use the snapshot name and parsing to find the date, but then I don't think I have a way of tracking whether it's managed by ILM/SLM, leading to potential issues mentioned in: Prevent deletion of snapshots that are mounted and ILM delete action · Issue #73947 · elastic/elasticsearch · GitHub

Yes, I don't think this is a good approach.

Yes, see my previous message:

An update here, I've started to experiment with the Frozen/Partially Mounted snapshots at a small-ish scale, and this does currently seem to be a feasible solution. I did have one question before I started to expand this to scale*.

Question: Are there any more detailed documents on how the Partially mounted searchable snapshots work fundamentally?

I was looking through the docs, and there isn't too much about the technicals of how the feature interacts with the backend snapshot data stores (the docs seem to just cover a high level of how the local-cache works).

I ask this question because the snapshot storage being used has the following limitations:

  • 1Gbps bandwidth shared between all nodes
  • 250 simultaneous TCP socket connections for each node

In my small-ish scale testing I haven't noticed any issues with the above limitations, but I'm curious as I expand scope if I'll start to see issues.

* "At scale" is: 5k indices ~60TB of data that would be in the frozen tier. Majority of this data would be infrequently searched. (There would be 6 frozen nodes)

Hard to say really, searchable snapshots are designed to use effectively-infinitely-scalable storage like S3. Whatever you're using isn't really compatible with S3 in the sense that Elasticsearch requires:

Note that some storage systems claim to be S3-compatible but do not faithfully emulate S3’s behaviour in full. The repository-s3 type requires full compatibility with S3. In particular it must support the same set of API endpoints, return the same errors in case of failures, and offer consistency and performance at least as good as S3 even when accessed concurrently by multiple nodes.

It works on the same underlying files as regular searches, reading them in 16MiB blocks. So a single search that requires 250 random reads against a cold cache would involve 250 API calls, exceeding your request limit basically straight away. It'd also download ~4000MiB of data.

1 Like

Thanks for the info regarding how it works. I'll keep this in mind as I test a bit further with usage patterns that might run into this limitation.

Just adding a quick note:

I had originally in my post:

  • 250 S3 API calls per minute for each node

This was actually incorrect, the real limitation is:

  • 250 simultaneous TCP socket connections for each node

Oh ok, I believe the Amazon SDK imposes a limit of 50 simultaneous connections.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.