All replica shards reside on two nodes, 9 nodes total

I have an Elasticsearch 6.8.23 cluster (we are upgrading to 7.x soon) with 9 data nodes and 3,000 shards where all of the replica shards have allocated to two of the nodes. I have removed all cluster.routing.allocation settings and the shards stay on these two nodes. I have also tried moving shards and all of the decisions return YES except awareness which returns NO with the reason:

there are too many copies of the shard allocated to nodes with attribute [fault_domain], there are [2] total configured shard copies for this shard id and [3] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]

There are no awareness settings set in the cluster and we aren't starting the nodes with an attribute fault_domain.

I have also tried setting cluster.routing.allocation.exclude to exclude the two nodes and the shards do no move to other nodes.

Is there something I'm missing? I'm concerned that all of the replicas are on these two servers and not spread across the other nodes.

Elasticsearch 6.8 is EOL and no longer supported. Please upgrade ASAP.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

It turns out the Elasticsearch.yml does infact define a node attribute fault_domain and 6 of the nodes were set to the same value. The plan is to set these to their correct values and the expectation is the shards will rebalance.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.