Node reduction - Should I use "primaries" or "none"

Hi Everyone,

I am reducing an Elasticsearch cluster from 6 nodes to 3.

I have already started the process and can see shard draining from the nodes.

Nodes were taken out of the pool with the correct command so that part is all good:

"cluster.routing.allocation.exclude._name": "<redacted>,<redacted>,<redacted>"

While the initial drain seemed to happen quite quickly, it appears to have slowed to crawl now, and I am wondering if I have used the right command to allow shards just on primaries. I have seen posts that recommend "none" for the command:

"cluster.routing.allocation.enable": "primaries"

When I set this to none, the cluster turns to RED and I get a lot of unassigned shards (I guess this is expected because they can't go anywhere with 'none') - But then I read posts that RED is not a bad thing during this process....

But.. will the cluster eventually sort itself out? Or am I being too impatient and I need to wait for the shards to drain naturally with the "primaries" command, and Elasticsearch will eventually work out (and then drop) the shards it does not need from the nodes being drained?

Thanks in advance!

Which posts? If these are old posts they may be referring to old versions.

Normally you would set allocation to none during a restart to avoid shards moving when a node is know to be back online shortly, but I don't think that this is the recommendation for the past years as you can delay the allocation.

If you set it to none this means that no shards will be allowed to be allocate in the cluster, this includes the shards that you need to move from one node to another, so you cannot set it to none.

Which posts? They may also be referring to old-versions can you share some examples? A RED Cluster is not a desired state.

Which version are you running? You didn't say.

When you exlude a node from allocation, Elasticsearch will start moving out shards from the node, depending on the number of shards and their size this can take a lot of time, so you probably just need to wait.

How much data you have in those nodes? Does the 3 remaining nodes have enough space to receive all the data?

Thanks - I am running 6.8.23 - Looking to reduce the cluster to eventually go cloud, but that is some way off.

Thank you for confirming the none action. I suspect I just need to wait.

I have doubled the disk capacity of the receiving nodes, so space is not an issue, and I'm not hitting any watermark issues, etc.

The shards are moving very slowly, so as you point out, I likely just need to be patient with them.

The guide I followed was this one:

I know the ES documentation is available but it does lack a lot of examples and seems to assume prior knowledge, so this guide was a good one for me (apart from the "none" recommendation)

Current progress - so 'stuff' is happening:

shards  used  free  size  nodeip    node
392 	1.8tb 4.7tb 6.8tb 10.x.x.12 <redacted>-01
394 	1.8tb 4.7tb 6.8tb 10.x.x.14 <redacted>-02
394 	1.7tb 4.7tb 6.8tb 10.x.x.16 <redacted>-03
199 	1.1tb 2.5tb 3.8tb 10.x.x.13 <redacted>-04 <-- being removed
218 	1.1tb 2.4tb 3.8tb 10.x.x.15 <redacted>-05 <-- being removed
263 	1.4tb 2.2tb 3.8tb 10.x.x.17 <redacted>-06 <-- being removed

Having never done this before, I hope that the cluster will sort itself out and that eventually 04, 05, and 06 will get down to zero?

The cluster is green and happy at the moment. No issues.

Article about RED not being all bad :slight_smile:

You want cluster.routing.allocation.enable: all (the default). Setting this to primaries will prevent allocation of replicas, but you want to move all shards (both primaries and replicas). Setting it to none makes even less sense since this blocks all allocation.

red is always a bad thing. The article you linked is about how to fix it ASAP.

Weird, no idea why this is saying to use none. You want all throughout.

Ah ok... so now I'm even more confused... :laughing:

My understanding (likely very wrong now) was that having this set to "all" will still try to place shards onto removed nodes.

I suspect I'm reading old documents and now ES is smart enough to know that the hosts have been excluded.

I will change it to all. That might be the missing part of the puzzle.

YES! That was the issue...

Now it's powering through the shards...


shards  used  free  size  nodeip    node
395 	1.8tb 4.7tb 6.8tb 10.x.x.12 <redacted>-01
400 	1.8tb 4.7tb 6.8tb 10.x.x.14 <redacted>-02
395 	1.7tb 4.7tb 6.8tb 10.x.x.16 <redacted>-03
193 	1.1tb 2.5tb 3.8tb 10.x.x.13 <redacted>-04 <-- being removed
216 	1.1tb 2.4tb 3.8tb 10.x.x.15 <redacted>-05 <-- being removed
261 	1.4tb 2.2tb 3.8tb 10.x.x.17 <redacted>-06 <-- being removed

THANK YOU!!! :folded_hands:

You are using a pretty ancient version of ES but I don't think this advice has changed going back to the dawn of time. In contrast, the linked docs from Opster are dated 2024-12-16 which is pretty new. And Elastic bought Opster quite some time ago. These docs are just plain wrong unfortunately, I'm trying to work out how to fix them.

1 Like

Apologies for this mistake, we'll fix it asap!

2 Likes

Document now has "Updated: Mar 17, 2025 | 2 min read" so that was a quick fix. Well done.

2 Likes

Great outcome. Thank you everyone!