I've got a 3 node setup i produktion 2xdata, 1xclient node. I do rolling deployment using Octopus Deploy, but in Production I often end up with a red cluster. The deployment pipeline takes a single node at a time and does the following
- Ensure green cluster status
- Take node out of NLB
- Check for pending reboots
- Shutdown node (disable shard reallocation and do synced flush)
- Shutdown Elasticsearch service (Windows) and install new Elasticsearch version. Start the service again
- Wait until local Elasticsearch node is alive
- Enable shard allocation
- Install Elasticsearch plugins
- Wait until cluster is green.
- Put node back into NLB
The result is (often) that I get a bunch of shards that is not located on either nodes, hence red cluster, despite I wait for green cluster before continuing to the next node. What am I missing here?
UPDATE: It seems that it's old indexes prior 6 months that suddenly started to pop up (we create an index per day), and the cluster thinks they should be present in the cluster, but is deleted by our retention script. How come they show up after a reboot of a node?
In this particular case I was upgrading from 2.3.2 to 2.4.1.