Using the frozen tier with Elastic Cloud and ILM policies

Hi all,

I’m evaluating the frozen tier functionality in Elastic Cloud deployment, aiming to keep around ten years of historical data available for occasional analysis and compliance purposes.

Given that we plan to retain data for such a long period, what should we take into account regarding the use of the frozen tier and the compatibility between Elasticsearch versions and the underlying snapshots? Are there any known limitations or recommended best practices to ensure long-term accessibility?

Also, is there any way to limit access to indices once they move to the frozen tier? Ideally, only a few users should be able to query those historical indices, but since permissions are applied at the data stream level, it’s not clear to me how to handle that separation.

Any guidance or examples would be greatly appreciated.
Thanks!

Hi @javierE

This is Big question. Lots of aspect to consider from my perspective.

First Simple stuff

1st:

First here is the snapshot compatability guide.... In general you can see "backwards compatability ... but will that hold for 10 years ... into the future... that is hard to say.

2nd:

This is easy. Answer is yes you can create roles that limit based on time range or event just tier...

Example role can include this and it will exclude all frozen tier from any search.

"must_not": {
  "terms": {
    "_tier": [
      "data_frozen"
    ]
  }
}

Now the more Nuance and Opinion:

10 years is a LOOOONG time in IT years...for any solutions / vendor etc...,etc..

You do not need to answer all these questions here on a public forum but these are types of questions I would be asking....

  • Also how much data are talking over 10 years, 100s of GBs, 100s of TBs..... Multi-Petabyte?

  • What is the requirements for recall / search? Minutes...Days...

  • What is the risk of losing data?

  • What is the vendor stability at all levels?

  • What is the budget vs risk, recall, rework / maintenance etc?

  • Depending on these I would solution out an architecture

We have customers that keep everything in Elastic + Searchable Snapshot (Frozen)

We have others that keep a raw copy in raw S3 and keep some in Elastic.

I would suggest reaching out to Elastic and discuss in detail.

You can send me a DM with some of the basics above and your company name and contact and I can see if I can connect you with the right resources.

That is what I would do.

Hi Stephen,

Really appreciate the detailed response — it’s very helpful.
This is indeed a big topic, and your points help frame it much better.

This is exactly what I was looking for! Thanks!

I’ve checked the snapshot compatibility guide you mentioned — it makes sense that predicting compatibility that far into the future is uncertain. My main concern is simply making sure that using the frozen tier for long-term retention doesn’t create any hard limitations (for example, managing a very large number of mounted snapshots , too big repository) or migration challenges later on.

We’re in a very early evaluation stage and don’t have concrete answers yet to the questions you raised. The goal for now is to understand what technical options and constraints exist before moving forward.

At this stage, we’re considering this approaches:

  • Long-term snapshots stored in S3 o Azure (restore on demand or partial mount as needed)

  • A pure frozen tier approach (now that we know how to filter privileges by tier)

  • A mixed strategy — theoretically (please correct me if I’m wrong), if a snapshot policy shares the same repository used by the frozen phase, it would be possible to retain data in the frozen tier for a shorter period while keeping the full backup for the total required retention time.

    The idea is that, due to the nature of snapshots, repository storage usage would remain efficient, and when the frozen phase ends, the delete action would only unlink the searchable snapshot while the underlying data would still be referenced by the main backup policy. Would such a scenario be feasible?

    The motivation behind this is to avoid maintaining an increasingly large frozen tier over many years (with numerous indices, active ILM policies, and snapshots mounted in the cluster), which could eventually introduce significant load or operational overhead.

Regarding this,

if I understood correctly, that refers to storing the original logs as raw files in S3 for long-term retention, while keeping indexed only recent or relevant data in Elastic. Would this approach also make sense for metrics or APM data, where the raw format isn’t as straightforward as text logs?

I also appreciate the offer to connect — I’ll likely reach out directly once we have a clearer picture of the use case and some concrete numbers.

Best regards,

I say raw generically but we have users that process the data in logstash even with pipelines or integrations and then write the processed logs to S3. So that's another option.

Traces and metrics....

That opens another line of thinking.
Typically people would do a roll-ups / down sampling or something for longer term, especially for metrics...

Really old metrics 10-year-old metrics. Pretty low value..

I'd have to think about traces a bit more, but there is ways to bifurcate the flow

I think the value of traces also reduces drastically unless there's some very valuable metadata in it

Stephen,

Thanks a lot for the follow-up and for sharing these additional considerations

Really appreciate your time and guidance.

Best regards

1 Like