Hi team,
I have a 10 node es cluster created by ECK on k8s v1.22 .
And Persistent storage uses local pv .
For some reasons, I want to move one of the es instances to a new node.
Because the index has opened the copy.
I changed node affinity settings. Then delete the node directly.
And wait for eck to reschedule.
But it turned out that the machine had stored 500GB of data, and it took dozens of hours to reschedule calls.
My question is:
Is my approach correct?
How to speed up node migration to another machine?
Any suggestion will be very helpful for me, thanks a lot !
I am not sure I fully understand the sequence of events your are describing. I am assuming you are running your indices with replicas configured? If so, once you deleted the node the replicas should be promoted to primary, which should be almost instant.
Elasticsearch will then start reallocating replicas for the shards that have been lost with the node you removed. This can indeed take a long time depending on the size of the indices as they recover the replica slowly from the primary.
But unless you have a way of copying the local volume to another node, I don't see another way than forcibly removing the node as you did it. This is of course not ideal because if this operation coincides with an unplanned failure of another node that holds the replicas (which are just about to become primaries) you are set up for data loss.
You may speed up by adding the new node in advance, so it will take some shards as usual.
Then check the _health of your cluster is green (so every index has primary and copy)
Then delete the old node (or first change the shard allocation awareness with a temporary zone to avoid going to yellow when deleting, in this case wait for the relocation of shards from the old node)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.