Speedup Migration/shard allocation on hot warm nodes

Hi,

While migrating the indices older than number of days from hot to warm,I beleive setting replica to 0(I know it is bit risky) can speed up the migration of indices,Please suggest if there is any other way(any parameter changes) where the shard allocation speed to warm nodes can be increased.

The speedup you get by reducing the number of replicas to 0 is due to copying less data: you only have to make one new copy of the shard on a warm node. However this means you don't have any redundancy, and given that disks are generally a little unreliable I would replace "bit risky" as "guaranteed to lose data in the long run" in what you said.

If you mean to set the number of replicas to 0, perform the migration, and then add replicas again, then you will copy the same amount of data either way, so I don't understand the benefit.

Can you give some more numbers about the problem you're trying to solve? How large is your cluster, how much data are you talking about, and how long does it currently take?

Thanks for the update, I was aware of setting replica to 0 as an option available with risk of losing data(haven't used it).

Cluster size is around 30TB with 5 hot and warm nodes each.

To Move around 100gb of data it takes around 8 to 10 hours and the shard count is around 150 with replica set to 1.

Hence wanted to know if there is any changes that can be done to speedup the shard allocation in warm node.

How many indices and shards is that speread across?

We have around 25000 shards and 7146 indices, 5 hot nodes and 7 warm nodes and total heap memory allocated to all these nodes(hot+warm) is around 450 gb.

Sometimes it takes around 4 to 5 hours to move around 400 gb of data from hot to warm(using curator we are moving 7 days old data daily) ,whereas there are also days where it takes 10+ hours to move around 400+gb of data.Hence wanted to check if there is anyway where we can speed up the data movement from hot to warm.

You have far too many shard given the size of your cluster and data. Please read this blog post for some practical guidelines on recommended shard sizes and sharing practices.

Having so many shards can slow down cluster state updates and propagation that need to happen as shards are moved around. I would expect you to see much better performance with fewer larger shards.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.