We have a single node cluster where one index unfortunately grew very big (261Gb) as we had no ILM on it. This is a production cluster.
We understand that above 50Gb there is performance degradation and I think we now start to feel it
The logical step would be to implement a ILM Rollover.
The disc on that google cloud compute machine is a 562Gb disk with 461Gb used (102Gb free, ie 82% utilization).
We do not want to expand this disk as it is too troublesome to shrink back later on.
We have about 2 years of data on this index, but only realistically need 1yr. We can have some use cases where we want to restore the older data though.
What would be the best way to proceed to minimize downtime on this node, to archive the >1yr data and implement rollover on this index to keep each shard around 25Gb ?
I understand we can use curator for that purpose. Which action should we use?`
Is the data all in a single index? If so, neither ILM nor Curator can help you with that. Both are for managing data at the index level. The only way to purge data from within an index older than a given date would be a delete_by_query operation. This will be painful and slow by comparison, but it will eventually get done. The question is whether the disk I/O from the delete_by_query operation affects performance to a degree that impedes your regular operations.
Those steps look pretty good to me, but I'd switch the order to: 3,4,1,2,5. You want the template to be applied when you create the new index.
Also, after thinking about it some more I wouldn't recommend reindexing your historical index into the new one because as it reindexes and rolls over, your current data will be spread out across all the indexes. You mention that you only need to keep 1 years worth of data, but if the new data is spread out you'll need to keep all those indices for an extra year since they will all contain recent data, if that makes sense.
The old school way to handle data in elastic was to have the date in the index name and I think that applies here since these will start aging out of a year starting next month.
For example to reindex last year's March data to a new index could be done with:
Actually after thinking about it some more I'd recommend reindexing the data up to the current month and then do the steps you mentioned above. Then once live data is hitting the new index, reindex only the current month (or whatever timeframe you decide on) into that index. That way you won't have your live data spread out across multiple indices and you also won't have your reindexed indices under a lifecycle policy.
Play around with it to get comfortable with what is happening. You can set the date range to be quite small to get a good feel with what is going on. I hope that all makes sense, and good luck.
Of course I have to add the comment to make sure to have taken a snapshot of your data first. Once you are satisfied that your year old data is in the new reindexed indices you can delete my-index-obj.