Shard reallocation after rolling restart


(Wenceslas Des Deserts) #1

Hi there,

I just performed a rolling restart of my cluster (I needed to update a plugin), following the doc. I disabled shard allocation, restarted one node, waited until it joined the cluster, reenabled allocation, waited cluster state to be green, and repeated the process on each node.

However, I was a bit surprised to see some shards being moved from one node to another after re-enabling allocation.

I thought shards were simply going to be reassigned on the node that restarted since data was already on the disk. But some of them were moved...

What is the reason for that ? Can we do something against it ?

Thank you.


(Ali Beyad) #2

Hello,

Were you still indexing documents during the rolling restart process? If so, when you stopped a node (assuming you have replicas in addition to primary shards), the replica for some shards would have gone offline and if the node held primary shards, then the replicas for that shard in the cluster will have been promoted to primary. Now, when the node rejoins, if you were indexing, then Lucene could have merged segments, rendering your underlying segments files completely different from the shard data on the node that left the cluster to be upgraded. When that node comes back, none of the files for the shard data are the same (even though they both contain many of the same documents), so Elasticsearch does not see any reason to favor the rejoined node for allocating that shard over any other node in the cluster.

Here are some guidelines for making the upgrade process as smooth as possible: https://www.elastic.co/guide/en/elasticsearch/reference/current/rolling-upgrades.html

Lastly, the above problem I described will go away once sequence numbers are introduced (expected for 6.0), which will allow recovery to be based on missing index operations as opposed to file-based recovery.


(Wenceslas Des Deserts) #3

Hi,
I am not sure if the cluster was in use or not, so it's totally possible that what you describe happened. Makes sense.

Thank you for your nice answer. I'm looking forward ES6, then !


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.