Shard allocation is not happening as expected after adding two more nodes to our ES cluster


(Arunlal A) #1

Hi,

We are facing some issues with shard allocation (both primary and replica) after upgrading the ES cluster size. We had a three node elasticsearch cluster and we recently added two more nodes into this cluster.

The two new nodes are only data nodes (node.master: false, node.data: true).

The problem right now we're facing is, the newly created indexes are only allocated to the two new nodes (ES4 & ES5 - in our case). And the replica shards / primary shards are not getting created / allocated to old nodes (ES1, ES2 and ES3).

  1. Is this an expected behaviour? if so, how we can split the shards across all nodes and the replica shard?
  2. We are planning to upgrade the node size to 10 or more. So how we can distribute the shard allocation across all nodes?

Someone please share if any documents available.

This is an example of shards allocated after adding two nodes to our ES cluster.
[Settings 5 shards, 1 replica]


(Juergen Stuermer) #2

Shards, which are created by a higher versions, cannot be allocated to lower version nodes (in some cases). We had the same issue with shards from version 6 which did not get allocated to ES 5 nodes :slight_smile:


(Arunlal A) #3

Thanks @kley

We are using 6.7.1 ES for all nodes.


(Arunlal A) #4

I added a new test index (5 shards and 1 replica)
Adding the image of head:

I can see this error from _cluster/allocation/explain

  "decider" : "throttling",
  "decision" : "THROTTLE",
  "explanation" : "reached the limit of incoming shard recoveries [6], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}

It's not trying to allocate on old nodes. I tried to increase the above value, after that the replica shards also allocated only to new nodes with status "yellow."


(Arunlal A) #5

Today shards are allocated to only one node and replica shards are first marked as unassigned and then allocated on an old node. Is this the expected behaviour?

Our expectation, the newly created shards allocate to all 5 nodes (the index settings is 5 shards 1 replica) and same for replica shards. However it's not happening by default.

Now we are checking the option to allocate the shards to all nodes by setting routing from index template.

I am little curious about to know the issue (behaviour of shard allocation) now we are facing. Is it the correct way. Can anyone help on this, please?


(Christian Dahlqvist) #7

Do all the hosts have the same hardware specification and amount of storage available? Are you using node attributes to control sharing through shard allocation filtering and/or awareness (see the cat nodeattrs API)? What does the cat nodes API show?


(David Turner) #8

I think this is the situation described in this longstanding issue:

Elasticsearch generally prefers to allocate shards to empty nodes, so if you add a node and then create some indices before the cluster has finished rebalancing then it will allocate more shards to the new nodes than you might expect.

For now you can try and use the index-level setting index.routing.allocation.total_shards_per_node to ensure that no node receives more than its fair share of shards of a new index.


(Arunlal A) #9

Thanks @Christian_Dahlqvist
All the specifications are same for all five nodes.

Thanks @DavidTurner for the explanation.
I tried different combination of shard balancing heuristics value and the shards are only creating to new nodes. I will try the setting you suggested and test it again.

So eventually it will allocated to all nodes, correct?


(Christian Dahlqvist) #10

You might be able to use the total_shards_per_node index setting to force existing indices to be rebalanced and thereby spread the load better.


(Arunlal A) #11

Thanks @Christian_Dahlqvist & @DavidTurner
I have added the " total_shards_per_node" in index template for now.

One more question, is there any option to increase the re-balancing speed to cluster (for old indices)?

I have increased the "cluster_concurrent_rebalance" value (current value 5). Is there any other options?