I was about to answer your other question related to disk space in general but you removed it
You seems to be looking for ideas to reduce the resources needed for your cluster, so I might not answer directly to your questions but more give some ideas.
Hot, Warn, Cold, Frozen, Delete:
- Hot: The index is actively being updated and queried.
- Warm: The index is no longer being updated but is still being queried.
- Cold: The index is no longer being updated and is queried infrequently. The information still needs to be searchable, but it’s okay if those queries are slower.
- Frozen: The index is no longer being updated and is queried rarely. The information still needs to be searchable, but it’s okay if those queries are extremely slow.
- Delete: The index is no longer needed and can safely be removed.
Depending on your use case, you can think about using those phases to perform some changes to reduce the disk space. For example, for logs, I'd probably not reduce anything in the hot phase but probably in the warm or cold phases.
If you are snapshotting an index on S3 or similar, and if your index is not updated anymore (again timeseries data), you can move to the Frozen phase (needs a commercial license) or the Delete phase. That way, you won't consume anymore costly resources like SSD space but you will offload your data to S3 which is a way cheaper... With Frozen (and searchable snapshots feature), you can still search within S3 data. With Delete, you will need to restore first (and consume again some disk space) before being able to search.
If you don't need anymore the original data but only the aggregated view, you can downsample your index.
Aggregates a time series (TSDS) index and stores pre-computed statistical summaries (min
, max
, sum
, value_count
and avg
) for each metric field grouped by a configured time interval. For example, a TSDS index that contains metrics sampled every 10 seconds can be downsampled to an hourly index. All documents within an hour interval are summarized and stored as a single document and stored in the downsample index.
In the Warn or Cold phase, you can set the number of replicas to 0, assuming that you have backups in case of any problem...
If you have many primary shards in the Hot phase, you can also use the Shrink API to reduce the number of shards to 1 and do a force merge. That will help to reduce also the disk space. If before running the force merge, you set index.codec
to best_compression
, that might also help (See Index modules | Elasticsearch Guide [8.11] | Elastic). Please be aware that all that will consume a lot of IO because all the segments needs to be rewritten...
Those are some ideas which I hope will help you to find what is best for you...