I'm thinking to apply ILM policies on my ES cluster containing historical data that needs to be preserved for legal purposes with retention period being 7 years (85 months).
This cluster used to have day-wise indices. In order to boost search performances, I re-indexed day-wise indices into monthly indices.
The customer is interested only in last 18 months data. Rest of the data is kept for legal purposes and is queried seldomly.
Hence, I'm thinking to deploy hot-warm-cold-delete architecture in this.
-
The delete policy is straightforward -->
Delete monthly indices older than 85 months.
-
As per ILM, the current month index should be on hot node. Since my current month index is monthly which is just single index of around 500 GB primary with 12 shards and 1 replica, does it make sense to have a HOT node containing just 1 index or 2 hot nodes containing 1 index and its replica?
I was thinking to go with hot-cold-delete
architecture (no warm
). i.e. keep last 18 months data in hot nodes
. And rest of the data which is older than 18 months and less than 7 years in cold nodes
.
My questions:
-
Does this make sense?
-
Or you'd suggest to have just current month index and its replica on hot nodes and data from previous month till previous 18 months on warm nodes?
-
If I have to configure ILM, I've to specify
"min_age" : "31d"
in order for current monthly index to move from hot to warm/cold. Is that correct? Which means an index belonging to the month of April will not be moved to warm/cold node after 30 days but rather after 31 days. -
For my use case, I don't need to configure
rollover
.
ELK Stack Version is 6.8. All past monthly indices are forcemerged and use best_compression. Even the current month index uses best_compression. The indexing penalty due to best_compression on current index is acceptable since the data doesn't need to be immediately queried.
Thanks