Shards unassigned after some nodes went down


We have a ES cluster consisted of 6 nodes, 3 in one data center 3 in another data center. Each of them is master eligible but 4 are data nodes. We have 3 indices, each has 5 primary shards with 1 replica. Now due to some disaster recovery scenarios on data center went down and after that elasticsearch cluster went in status RED with reason:

"cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster"

I checked the GET _cat/shards?v=true&s=prirep

And got:

index           shard prirep state      docs store ip          node
firstIndex  2     p      STARTED       0  283b xx.xx.xx.xx datanode2-datacenterB
firstIndex  1     p      STARTED       0  283b xx.xx.xx.xx datanode1-datacenterB
firstIndex  3     p      UNASSIGNED                        
firstIndex  4     p      UNASSIGNED                        
firstIndex  0     p      UNASSIGNED                        
secondIndex 2     p      STARTED    3375 2.8mb xx.xx.xx.xx datanode1-datacenterB
secondIndex 3     p      STARTED    3416 2.2mb xx.xx.xx.xx datanode1-datacenterB
secondIndex 1     p      STARTED    3411 3.2mb xx.xx.xx.xx datanode2-datacenterB
secondIndex 4     p      UNASSIGNED                        
secondIndex 0     p      STARTED    3512 2.9mb xx.xx.xx.xx datanode1-datacenterB
thirdIndex  2     p      STARTED    4688 1.3mb xx.xx.xx.xx datanode1-datacenterB
thirdIndex  1     p      STARTED    4745 1.4mb xx.xx.xx.xx datanode2-datacenterB
thirdIndex  4     p      UNASSIGNED                        
thirdIndex  3     p      UNASSIGNED                        
thirdIndex  0     p      STARTED    4845 1.4mb xx.xx.xx.xx datanode2-datacenterB
firstIndex  2     r      STARTED       0  283b xx.xx.xx.xx datanode1-datacenterB
firstIndex  1     r      STARTED       0  283b xx.xx.xx.xx datanode2-datacenterB
firstIndex  3     r      UNASSIGNED                        
firstIndex  4     r      UNASSIGNED                        
firstIndex  0     r      UNASSIGNED                        
secondIndex 2     r      STARTED    3375 2.8mb xx.xx.xx.xx datanode2-datacenterB
secondIndex 3     r      STARTED    3416 2.2mb xx.xx.xx.xx datanode2-datacenterB
secondIndex 1     r      STARTED    3411 3.2mb xx.xx.xx.xx datanode1-datacenterB
secondIndex 4     r      UNASSIGNED                        
secondIndex 0     r      STARTED    3512 2.9mb xx.xx.xx.xx datanode2-datacenterB
thirdIndex  2     r      STARTED    4688 1.3mb xx.xx.xx.xx datanode2-datacenterB
thirdIndex  1     r      STARTED    4745 1.4mb xx.xx.xx.xx datanode1-datacenterB
thirdIndex  4     r      UNASSIGNED                        
thirdIndex  3     r      UNASSIGNED                        
thirdIndex  0     r      STARTED    4845 1.4mb xx.xx.xx.xx  datanode1-datacenterB

Could anyone suggest what can I do in this situation? Im not sure If I should add more replicas, or maybe make the primary shards number exact as number of total nodes in cluster?


If you are not already you should use shard allocation awareness to make sure each shard get the primary shard allocated to one DC and the replica to the other. With this you can do with a single replica.

Also be aware that Elasticsearch can not support symmetric high availability across only 2 zones. If your cluster continued to be operation when you lost half the master eligible nodes it may very well be misconfigured, which could also lead to data loss. Which Elasticsearch version are you using? How are the nodes configured (especially minimum_master_nodes)?

Hi Christian,

Thank you for your reply :slight_smile:

I will look into the Shard Allocation feature today, thank you :slight_smile:

We're using "7.5.2" version. The configuration is as follows:
Each node has respective names: es1-Zone(A/B) es2-Zone(A/B) es3-Zone(A/B)
cluster.initial_master_nodes: es1-ZoneA,es2-ZoneA,es3-ZoneA
For discovery hosts each node in one data center sees itself, all other nodes and one node (master-dedicated node) in other data center (Currently is master-dedicated but throughout my testing of this sharding issue I changed those masters only to be data-eligible as well cause I thought that maybe if there're more nodes then shards can be reassigned to them. And now I cannot change it back to be master-only cause there're already data saved there).

Correct me if I'm wrong but I thought that when there're even number of nodes in a cluster then elasticsearch, while still keeping track of it, removes one from voting configuration?

How far apart are these datacenters?

Geographically? 3-4 states apart.

If you were in Australia, that'd be the entire width of the country. Which is not supported due to latency concerns.

In a cluster all nodes need to see each other and be able to communicate. Distributing a cluster across data centers far apart is not supported nor recommended as it will cause performance and stability problems.

Unless you have shard allocation awareness it is possible both primary and replica for a specific shard will be allocated to the same DC which naturally impacts resiliency.

Thank you @Christian_Dahlqvist, the solution with shard awareness attributes worked :slight_smile:

I'm aware that our cluster config may be far from ideal although that was a requirement for me to configure ES in both our data centers to increase resilliency and to be ready for any Disaster Recovery scenarios. Unfortunately these are only 2 data centers I can deploy and I cannot control how far apart they are

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.