Shrinking a large cluster

dolsson · August 28, 2019, 9:37am

Hi,

We have a scenario where we need to shrink a 60 data-node (+ 1 master) cluster down to just 30 data nodes. The cluster currently holds about 1.8PB of data and each data node is a powerful bare metal server connected via 10GbE switch.

Normally when decommissioning a node we just use cluster-level shard allocation filtering ("cluster.routing.allocation.exclude._name": "node name") but we've never done that on this scale before so we're wondering what the best approach is.

What could potentially happen if the 30 nodes are excluded all at once? Could the 10GbE switch be overwhelmed by the migrating data and make the cluster go unstable? If so, is there a dynamic mechanism to throttle the shard relocation somehow?
If excluding all 30 nodes in one go is inadvisable and we have to decommission the nodes in stages, say 10 at a time, is there a way to make the data migrate to just the 30 nodes that are to remain operational? For example if we decommission nodes 51-60 we don't want the data to migrate to nodes 31-50 because the data would then be migrated multiple times as we then decommission nodes 41-50 and finally nodes 31-40.

We use Elasticsearch version 6.7.1 btw.

system · September 25, 2019, 9:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Phase out VM from cluster Elasticsearch	3	166	August 11, 2023
Decommission of multiple nodes Elasticsearch	5	8686	July 6, 2017
Undo decommissioning of a node Elasticsearch	2	831	February 13, 2018
Fundamental question about ES data/shards Elasticsearch	3	417	July 6, 2017
How to shrink a cluster Elasticsearch	6	2171	July 5, 2017

Shrinking a large cluster

Related topics