We're planning to set up a cluster and use a datastream to ingest our custom logs into elasticsearch.
Current setup (spread out over 3 ftServers):
- 3x master nodes (4vCPU, 12GB RAM)
- 3x data nodes (4vCPU, 64GB RAM, 6TB HDD)
- 2x Kibana nodes + 2x reverse proxy
- 2x logstash node
- 28TB RAID50 storage
- 60GB/day ingested in a datastream (
custom_logs). This creates daily backing indexes e.g.
- 90 days 'queryable' retention period
- Afterwards, another 275 (365-90) days of 'archived' logs
Strategy I'm planning:
- Create an ILM that sets up the datastream and daily rollover. It will also define a retention period of 90 days and deletes the backing indices after this period.
- Create a SLM that performs a daily snapshot of the datastream's backing indices to the 28TB repository. After 365 days, the first snapshot gets deleted, on day 366 the second, ...
Now my questions:
- Are their any major flaws in my thinking/this design?
- Suppose 9 months after implementation (2023-12-14) I want to review the logs from 3 months ago. Can I do this by restoring
snapshot-custom_logs-2023-09-14? Will this show me the logs from 3 months ago and the 90 previous days, or all the logs from the beginning of the snapshots 'till 2023-09-14?
- Restoring snapshot means the data nodes have to be able to store the current 90 days of logs + the amount of logs restored, right?