My cluster has a total usage reported by kibana (kibana/app/monitoring#/overview
) as 1TB (102 indices with one replica.) Snapshots go to backblaze (as the s3 default client) which is reporting 8TB stored.
I was snapping all indices hourly and just yesterday modified the daily job to only snap the few indices I care about and only a few times a day with a week-long retention.
The data doesn't change too often so I think mostly a once a week snap will be fine, but when it does change it basically overwrites the entire dataset (index not update.)
Should I:
- Use a single policy to snap hourly and keep it for 45d or so?
- Use multiple policies in the same repo with different schedules to snap a few times a day and also once a week?
- Use multiple policies (one per index) allowing a restore of any single index instead of needing to restore all indices?
Based on the data changes I think we would basically be restoring back to a particular week/month set of data and not an hourly moment in time. Think inventory gets counted at the end of a month but is infrequently updated mid-month so weekly already feels like belt & suspenders.
I now have two policies:
- daily
0 30 1 * * ?
with retention of 7d, max count of 100 - weekly
0 30 2 ? * 4
with retention 45d, no max
With the above changes I hoped the extra data would age out once the snap retention surpassed the last big indexing job... but I'm still seeing 8TB in the repo with the oldest snap being after the last ingest.
Please help me pick the right path forward?