I recently installed a Netflow package that stores data in Elasticsearch.
What I didn't realise was, even though it came with an ILM policy, that policy did not enable rollover on hot phase by default. Thanks to stupid me, that means I now have a 1.8TB index to deal with.
I've now enabled rollover and all is well for new data.
You have a couple of options in here and the most important information is the retention period. If the retention period is close just wait, the index will be removed by ILM policy. If you should keep the index here are your options.
Reindex API - Run the reindex. Create another index pattern like elastiflow-flow-code-2.3-rollover-bigindex as destination index and define ILM policy for rollover. Add aliases after reindex completed.
Notes:
The split API is way more faster than reindex.
During reindex/split the cluster will be under pressure and this operation can take more than 10 hours (of course it depends on your hardware).
To not affect the main cluster, you can snapshot/restore the big index into another cluster and after the split/reindex operation in temporary cluster you can restore it into the main cluster.
During split/reindex the data can become duplicated if you use the same index pattern.
Use AutoOps to get notification about your cluster. There is a specific event to check the shard sizes. Whenever a shard size become large you will get notified about it. AutoOps: Simplify cluster management | Elastic
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.