While the initial drain seemed to happen quite quickly, it appears to have slowed to crawl now, and I am wondering if I have used the right command to allow shards just on primaries. I have seen posts that recommend "none" for the command:
"cluster.routing.allocation.enable": "primaries"
When I set this to none, the cluster turns to RED and I get a lot of unassigned shards (I guess this is expected because they can't go anywhere with 'none') - But then I read posts that RED is not a bad thing during this process....
But.. will the cluster eventually sort itself out? Or am I being too impatient and I need to wait for the shards to drain naturally with the "primaries" command, and Elasticsearch will eventually work out (and then drop) the shards it does not need from the nodes being drained?
Which posts? If these are old posts they may be referring to old versions.
Normally you would set allocation to none during a restart to avoid shards moving when a node is know to be back online shortly, but I don't think that this is the recommendation for the past years as you can delay the allocation.
If you set it to none this means that no shards will be allowed to be allocate in the cluster, this includes the shards that you need to move from one node to another, so you cannot set it to none.
Which posts? They may also be referring to old-versions can you share some examples? A RED Cluster is not a desired state.
Which version are you running? You didn't say.
When you exlude a node from allocation, Elasticsearch will start moving out shards from the node, depending on the number of shards and their size this can take a lot of time, so you probably just need to wait.
How much data you have in those nodes? Does the 3 remaining nodes have enough space to receive all the data?
Thanks - I am running 6.8.23 - Looking to reduce the cluster to eventually go cloud, but that is some way off.
Thank you for confirming the none action. I suspect I just need to wait.
I have doubled the disk capacity of the receiving nodes, so space is not an issue, and I'm not hitting any watermark issues, etc.
The shards are moving very slowly, so as you point out, I likely just need to be patient with them.
The guide I followed was this one:
I know the ES documentation is available but it does lack a lot of examples and seems to assume prior knowledge, so this guide was a good one for me (apart from the "none" recommendation)
You want cluster.routing.allocation.enable: all (the default). Setting this to primaries will prevent allocation of replicas, but you want to move all shards (both primaries and replicas). Setting it to none makes even less sense since this blocks all allocation.
red is always a bad thing. The article you linked is about how to fix it ASAP.
You are using a pretty ancient version of ES but I don't think this advice has changed going back to the dawn of time. In contrast, the linked docs from Opster are dated 2024-12-16 which is pretty new. And Elastic bought Opster quite some time ago. These docs are just plain wrong unfortunately, I'm trying to work out how to fix them.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.