Node Starts Without Its Shards

I am currently managing a 12-node ES cluster (version 2.1.1). Yesterday, one node, when restarted, would start up without any shards. That means that the Marvel plugins lists 0 shards for that node on the “Nodes” page, and _cat/shards|grep matches zero lines.

I turned on logging on DEBUG but did not manage to find anything related to the physical data storage that was in any way helpful in fixing this problem, or at least diagnosing what the problem actually is.

The data on the disc looks okay; its directory structure matches what’s on other nodes, file sizes seem to be in the right regions (no empty or uberfull directories), permissions are the same everywhere.

The “solution” for this node was to simply remove the data directory and let ES rebalance itself. (Luckily I finally got around to setting every index to at least one replica some hours earlier.)

However, this morning a second node did the exact same thing: after a restart it would not have any shards. This time somebody else in the company started to manually assign the unassigned shards to that node so I can’t diagnose this any further now. Then again, there are still some nodes left that I need to restart to update some configuration parameters… and I have kind of a bad feeling here. :slight_smile:

Now, what exactly is going on with a node that suddenly forgets its shards? How can I make ES use the data that exists on the disk and seems to be in pristine condition? How can I make ES tell me in the logfile what it’s doing with the storage when it loads shard metadata and what (and hopefully why) it fails?

Hi! Is your node have left for >=1 minute interval. Look man about delaying allocation and increase your settings for a reasonable period (2 minutes or more).
If you are restarting your nodes manually use steps described in rolling upgrade.

Yes, I did disable routing before restarting a node. The problem here is not how to rebalance but why the node “loses” its shards.

I think the root cause is in algorithm described in delaying allocation.

I read that page again (just in case) and respectfully, no, it isn’t. Rebalancing or what the cluster does when a node leaves and reappears was never a part of the question! The problem is: a node with ~550 shards on it restarts and afterwards has 0 shards even though the data is still on disk. The problem is not: how do I get the cluster to distribute shards to the empty node? (That is not a problem at all, as was also mentioned in the question. Twice.) The problem is: why is the node empty after a restart?

I mean this part

Imagine this scenario:

  • Node 5 loses network connectivity.
  • The master promotes a replica shard to primary for each primary that was on Node 5.
  • The master allocates new replicas to other nodes in the cluster.
  • Each new replica makes an entire copy of the primary shard across the network.
  • More shards are moved to different nodes to rebalance the cluster.
  • Node 5 returns after a few minutes.
  • The master rebalances the cluster by allocating shards to Node 5.

So the node that have left start empty because shard trying to allocate on other nodes.

With delayed allocation enabled, the above scenario changes to look like this:

  • Node 5 loses network connectivity.
  • The master promotes a replica shard to primary for each primary that was on Node 5.
  • The master logs a message that allocation of unassigned shards has been delayed, and for how long.
  • The cluster remains yellow because there are unassigned replica shards.
  • Node 5 returns after a few minutes, before the timeout expires.
  • The missing replicas are re-allocated to Node 5 (and sync-flushed shards recover almost immediately).

In this case shards are recovered from local files and everything works as expected.

PS. I can be wrong, and the reason in different. but I see that behaviour in our test and prod clusters.
And another point, look at active master logs maybe something about node leave and shard allocation logged there,

As mentioned before, I disabled rerouting during a restart. However, even if that were the case, the reappearing node would still have all its shards, wouldn’t it? (As this problem only appeared very recently and I have performed the occasional node restart before that, too, I am actually very sure that a node still has all its shards in that case.) It would just have all its primary shards redeclared as replicas. However, none of that happens as the cluster does not move or reroute anything while the node restarts.

Did you disable rerouting or shard allocation (as in rolling upgrade guide cluster.routing.allocation.enable)? If latter this can prevent shards from allocating on restarted node. Sure shards are in place on disk, but disabled allocation can prevent them to appear ("The missing replicas are re-allocated to Node 5"). What is the cluster behaviour after enabling allocation (node stays empty for a long time)?

1 Like

Okay… so what you’re saying is that a node only seems to successfully start up (i.e. loads all its shards) as a reaction to a cluster-wide reallocation (which I disabled prior to restarting a node) and that the node then stays empty until you reactivate the reallocation? You might be on to something… I remember having turned on reallocation directly after restarting nodes before these two nodes but I never realized that there was a direct connection between these two things. I have to restart more nodes soon so I’ll definitely get back to you on that. :slight_smile:

Okay, I have to apologize, you were right all along, @rusty. I just restarted a node, re-enabled routing, and everything is quickly back to normal.

Thank you!

Happy to help, it's great that you've solved the problem!