Yellow cluster state because of unassigned nodes, Can't relocate unassigned shard


(Colonelmo) #1

Hello.

I have a 2.3.3 elasticsearch cluster and I mistakenly joined a 2.4.1 node.

The problem is that somehow I lost two nodes in the cluster and I ended up with a primary shard on the 2.4.1 node and two unassigned replicas of that same shard. (Total replica count per shard is 3)

The cluster state is therefore yellow.

I have tried relocating the primary shard on another node but :

target node version [2.3.3] is older than source node version [2.4.1]

Same thing goes for trying to assign the two unassigned shards to 2.3.3 nodes.

Also I have tried cancelling placement of the primary shard on the 2.4.1 machine and It didn't work either.

Do you have any ideas how I can solve this? First thing that comes to my mind is upgrading the other nodes to 2.4.1 but let's say that is not an option.

I would really appreciate it if you took your time to answer this because I'm completely out of ideas.
Thank you for your time.


(Yannick Welsch) #2

Once a shard has made it to a node with newer version, it cannot move back to an older one. The reason is that the node with higher version might write data in a format that the older ones are unable to understand. Moving shards is thus only possible from older to newer nodes and not vice-versa. Upgrading to 2.4.1 is the only option here.


(Colonelmo) #3

Thanks.

And will rebuild index work?


(Yannick Welsch) #4

yes, that will work. You just have to make sure though that the rebuilt indices do not end up on the new node. This can be done by excluding the node from allocating new shards, see here: https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-filtering.html


(Colonelmo) #5

Hmmm so the data format is backward compatible while the index format is not. Right?
I have already set that, but I can't see it in GET /_cluster/settings. Does it mean that it is not yet set? My bad, it is set.

So if I rebuild the index, it will go fine and this node won't have any primary shards?

And will backup work correctly at this stage?


(Colonelmo) #6

I would actually prefer rebuild index, if it's safe to do so. Would you have any reason why I shouldn't do so and upgrade all of the nodes instead?


(Yannick Welsch) #7

Not sure what you mean here. The on-disk data format is not backward compatible, but the actual data should be. How do you plan on rebuilding the index? From an external data source?

If you correctly applied the shard allocation filtering mentioned above, shards of new indices won't be allocated to this node and existing shards won't be moved to this node. The shards that are currently on the node won't be moved to older nodes though.

It might be easier just to upgrade.


(Colonelmo) #8

Oh there's the problem. So the primary shard which only exists on the 2.4.1 node without any replicas will stay there even after the index being rebuilt and excluding allocation, and I'll again end up with two unassigned shards and a yellow state.

I'll just go with the upgrade then. Since I could accidentally join a newer node, I assume it's safe to upgrade the other nodes, right?

And since backing up the data the way documentation recommends probably won't work for me because of the newer index on the 2.4.1 node, is there a bulk GET api just to get all the data out and be able to repopulate the database if anything goes wrong during the upgrade?


(Yannick Welsch) #9

As long as you don't explain what you mean by "rebuilt", I cannot give an answer to that question. If you want to stay on 2.3.3 you can create a new index (the shards of that new index will be allocated on old nodes due to excluding allocation) and reindex the data from the old index into the new one. After that is done, you can check if all the data has made it properly into the new index and you can delete the old one.


(system) #10