While migrating the indices older than number of days from hot to warm,I beleive setting replica to 0(I know it is bit risky) can speed up the migration of indices,Please suggest if there is any other way(any parameter changes) where the shard allocation speed to warm nodes can be increased.
The speedup you get by reducing the number of replicas to 0 is due to copying less data: you only have to make one new copy of the shard on a warm node. However this means you don't have any redundancy, and given that disks are generally a little unreliable I would replace "bit risky" as "guaranteed to lose data in the long run" in what you said.
If you mean to set the number of replicas to 0, perform the migration, and then add replicas again, then you will copy the same amount of data either way, so I don't understand the benefit.
Can you give some more numbers about the problem you're trying to solve? How large is your cluster, how much data are you talking about, and how long does it currently take?
We have around 25000 shards and 7146 indices, 5 hot nodes and 7 warm nodes and total heap memory allocated to all these nodes(hot+warm) is around 450 gb.
Sometimes it takes around 4 to 5 hours to move around 400 gb of data from hot to warm(using curator we are moving 7 days old data daily) ,whereas there are also days where it takes 10+ hours to move around 400+gb of data.Hence wanted to check if there is anyway where we can speed up the data movement from hot to warm.
You have far too many shard given the size of your cluster and data. Please read this blog post for some practical guidelines on recommended shard sizes and sharing practices.
Having so many shards can slow down cluster state updates and propagation that need to happen as shards are moved around. I would expect you to see much better performance with fewer larger shards.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.