We have a plan to migrate our existing 54 nodes ES cluster in AWS cloud which run on ES 6.6.2 to new AWS instance types for better performance with the same ES version. As we see we are having two options.
- Spin up the new cluster with new instance types and restore the latest snapshot from the old cluster to the new cluster. For this we have to bear the cost of two AWS clusters till we have entire snapshot of (200 TB) restored
- Use cluster rerouting API. Add a batch of 10 new AWS instance types to the cluster use reroute API and move the data from old 10 nodes to the new 10 nodes and retire the old 10 nodes once the data movement is completed. Then perform the same set of operations till we move data of the entire old cluster. By this way we will be needing to bear only a cost of additional 10 nodes.
Option one seems to be simple and straight forward and the only disadvantage seems to be the cost. Option 2 seems to be more complex and we would like to know from you whether it is a good option since i know we need to play with cluster settings to switch off cluster rebalancing before performing cluster re-routing ad may need to consider other things too.
Could you please explain which option is the right pick to move 200 TB and also let us know the reason for the same. Could you please also tell us is there any better solution for this usecase
Can someone please comment on the above ask
There are many "it depends" questions when working at this kind of scale but both options seem basically sound. The main disadvantage with option 1 is that it involves restoring a snapshot which means you will lose any changes made since that snapshot was taken so you might need to switch to a read-only mode for the duration. Option 2 involves full availability throughout.
Why only start up only 10 new nodes at once? The more new instances you start at once the faster the shards should migrate, since limits are generally applied on a node-by-node basis, so I think it costs about the same either way.
I don't think you need to use the cluster reroute API to move shards. You should just be able to add an allocation filter to exclude the old nodes and let Elasticsearch move them automatically.
I don't think there's any need to switch cluster rebalancing off.
There's only so much advice we volunteers can offer for free here, and we can't look at the detailed configuration of your cluster to check it's going to work smoothly without falling into any traps. At this kind of scale it's almost certainly worth paying for some expertise to check things over in detail and to be on hand in case something goes wrong.
Thanks @DavidTurner . I went through the below link to understand more about shard allocation filtering.
But came across below doubt. Could you please help to clarify.
When we add the new nodes of new instance types to the existing cluster, Elasticsearch will immediately re-balance since the cluster re-balancing is turned on. This means that it will re-balance the entire 50 nodes to 60 nodes (with 10 newly added nodes). This will defeat the purpose of moving complete data from one node to another right?
It depends what you mean by "immediately". By default it'll start moving a couple of shards onto the new nodes (which is what you want anyway, right?) but that'll take some time to complete; meanwhile your allocation filters will trigger so much more shard movement that you can basically ignore rebalancing.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.