Dealing with a huge Index

Hi Team,

I recently installed a Netflow package that stores data in Elasticsearch.

What I didn't realise was, even though it came with an ILM policy, that policy did not enable rollover on hot phase by default. Thanks to stupid me, that means I now have a 1.8TB index to deal with.

I've now enabled rollover and all is well for new data.

I still have the 1.8TB beast to deal with though.

I've tried removing and re-applying the lifecycle policy to this index, but no dice.

I'm now looking and the re-indexing and splitting options but there's some really mixed/confusing advice out there.

I suspect that I'm not the first to make this mistake. That being the case, can anyone advise the best way around this?

Thanks & All the BEst

ChIP

You have a couple of options in here and the most important information is the retention period. If the retention period is close just wait, the index will be removed by ILM policy. If you should keep the index here are your options.

  1. Split API - The data size will be double for that index. So, make sure that you have enough disk space. Split index API | Elasticsearch Guide [8.16] | Elastic
  2. Reindex API - Run the reindex. Create another index pattern like elastiflow-flow-code-2.3-rollover-bigindex as destination index and define ILM policy for rollover. Add aliases after reindex completed.

Notes:

  • The split API is way more faster than reindex.
  • During reindex/split the cluster will be under pressure and this operation can take more than 10 hours (of course it depends on your hardware).
  • To not affect the main cluster, you can snapshot/restore the big index into another cluster and after the split/reindex operation in temporary cluster you can restore it into the main cluster.
  • During split/reindex the data can become duplicated if you use the same index pattern.
  • Use AutoOps to get notification about your cluster. There is a specific event to check the shard sizes. Whenever a shard size become large you will get notified about it. AutoOps: Simplify cluster management | Elastic

Thanks for your suggestions Sir.

Much appreciated

ChIP

You're welcome sir. Please let me know the result (split time etc.) with your cluster hardware.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.