Dealing with a huge Index

numpty-boy · December 7, 2024, 11:02am

Hi Team,

I recently installed a Netflow package that stores data in Elasticsearch.

What I didn't realise was, even though it came with an ILM policy, that policy did not enable rollover on hot phase by default. Thanks to stupid me, that means I now have a 1.8TB index to deal with.

I've now enabled rollover and all is well for new data.

I still have the 1.8TB beast to deal with though.

I've tried removing and re-applying the lifecycle policy to this index, but no dice.

I'm now looking and the re-indexing and splitting options but there's some really mixed/confusing advice out there.

I suspect that I'm not the first to make this mistake. That being the case, can anyone advise the best way around this?

Thanks & All the BEst

ChIP

Musab_Dogan · December 7, 2024, 10:31pm

You have a couple of options in here and the most important information is the retention period. If the retention period is close just wait, the index will be removed by ILM policy. If you should keep the index here are your options.

Split API - The data size will be double for that index. So, make sure that you have enough disk space. Split index API | Elasticsearch Guide [8.16] | Elastic
Reindex API - Run the reindex. Create another index pattern like elastiflow-flow-code-2.3-rollover-bigindex as destination index and define ILM policy for rollover. Add aliases after reindex completed.

Notes:

The split API is way more faster than reindex.
During reindex/split the cluster will be under pressure and this operation can take more than 10 hours (of course it depends on your hardware).
To not affect the main cluster, you can snapshot/restore the big index into another cluster and after the split/reindex operation in temporary cluster you can restore it into the main cluster.
During split/reindex the data can become duplicated if you use the same index pattern.
Use AutoOps to get notification about your cluster. There is a specific event to check the shard sizes. Whenever a shard size become large you will get notified about it. AutoOps: Simplify cluster management | Elastic

numpty-boy · December 10, 2024, 8:42am

Thanks for your suggestions Sir.

Much appreciated

ChIP

Musab_Dogan · December 11, 2024, 7:37am

You're welcome sir. Please let me know the result (split time etc.) with your cluster hardware.

system · January 8, 2025, 7:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Break up large index into multiple smaller equally sized indexes? Elasticsearch	23	6181	April 1, 2021
Best approach to implement ILM on a large index and archive old data Elasticsearch	6	647	April 18, 2023
How to rollover data stream during reindex Elasticsearch ilm-index-lifecycle-management , reindex , datastreams	1	207	May 19, 2024
ES cloud - Reduce index size with rollover failing Elasticsearch ilm-index-lifecycle-management	13	566	April 27, 2023
Managing large indices Elasticsearch	6	2334	October 2, 2022

Dealing with a huge Index

Related topics