Datastream snapshot strategy

Hello!

We're planning to set up a cluster and use a datastream to ingest our custom logs into elasticsearch.
Current setup (spread out over 3 ftServers):

  • 3x master nodes (4vCPU, 12GB RAM)
  • 3x data nodes (4vCPU, 64GB RAM, 6TB HDD)
  • 2x Kibana nodes + 2x reverse proxy
  • 2x logstash node
  • 28TB RAID50 storage

Requirements:

  • 60GB/day ingested in a datastream (custom_logs). This creates daily backing indexes e.g. .ds-custom_logs-2023-03-14
  • 90 days 'queryable' retention period
  • Afterwards, another 275 (365-90) days of 'archived' logs

Strategy I'm planning:

  • Create an ILM that sets up the datastream and daily rollover. It will also define a retention period of 90 days and deletes the backing indices after this period.
  • Create a SLM that performs a daily snapshot of the datastream's backing indices to the 28TB repository. After 365 days, the first snapshot gets deleted, on day 366 the second, ...

Now my questions:

  1. Are their any major flaws in my thinking/this design?
  2. Suppose 9 months after implementation (2023-12-14) I want to review the logs from 3 months ago. Can I do this by restoring snapshot-custom_logs-2023-09-14? Will this show me the logs from 3 months ago and the 90 previous days, or all the logs from the beginning of the snapshots 'till 2023-09-14?
  3. Restoring snapshot means the data nodes have to be able to store the current 90 days of logs + the amount of logs restored, right?

Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.