Cluster Maintenance / Housekeeping

Hello,

I am looking for some advice on index/shard housekeeping, data retention and the best way on how to handle older indices, which can potentially be used in future.

Details:

We have a ES cluster on AWS, 90% timeseries metric data. So data older than a month in my eyes are kind of useless. Except if reporting is needed to compare now vs 6 months ago.

So I understand that you can snapshot your data to s3, and delete data older than 3 months. But snapshotting the oldest data, for that day can be massive causing big read spikes, causing extra load on the cluster. (maybe theres better ways snapshotting)

But when data is needed, that needs to be restored. And due to multiple indices per day being created, a day worth of data can be massive, so the restoring of data can take some time. (AWS dont allowing closing indices :frowning: ) (restoring not that big of an issue)

My question

I need some advice on how to do housekeeping on old indices:

  • Do I snapshot and delete data older than x months?
  • Clear Cache / Flush / Merge / Reindex to indices with less shards or any other way for older indices to reduce overall resource utilization?
  • Any tips on how to tune indices that is not being written to?

I'm trying to reduce overall resource utilization to cater for any spike in ingestion/search etc, and don't want to allow the cluster to grow to thousands of shards.

What I have been doing, flush, clear cache, forcemerge and indices being small reindexing them into monthly indexes. But not entirely sure if thats a best practice and if I should not just snapshot and delete.

Looking forward to hear from the awesome community.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.