Hi,
We have a ES cluster consisted of 6 nodes, 3 in one data center 3 in another data center. Each of them is master eligible but 4 are data nodes. We have 3 indices, each has 5 primary shards with 1 replica. Now due to some disaster recovery scenarios on data center went down and after that elasticsearch cluster went in status RED with reason:
"cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster"
I checked the GET _cat/shards?v=true&s=prirep
And got:
index shard prirep state docs store ip node
firstIndex 2 p STARTED 0 283b xx.xx.xx.xx datanode2-datacenterB
firstIndex 1 p STARTED 0 283b xx.xx.xx.xx datanode1-datacenterB
firstIndex 3 p UNASSIGNED
firstIndex 4 p UNASSIGNED
firstIndex 0 p UNASSIGNED
secondIndex 2 p STARTED 3375 2.8mb xx.xx.xx.xx datanode1-datacenterB
secondIndex 3 p STARTED 3416 2.2mb xx.xx.xx.xx datanode1-datacenterB
secondIndex 1 p STARTED 3411 3.2mb xx.xx.xx.xx datanode2-datacenterB
secondIndex 4 p UNASSIGNED
secondIndex 0 p STARTED 3512 2.9mb xx.xx.xx.xx datanode1-datacenterB
thirdIndex 2 p STARTED 4688 1.3mb xx.xx.xx.xx datanode1-datacenterB
thirdIndex 1 p STARTED 4745 1.4mb xx.xx.xx.xx datanode2-datacenterB
thirdIndex 4 p UNASSIGNED
thirdIndex 3 p UNASSIGNED
thirdIndex 0 p STARTED 4845 1.4mb xx.xx.xx.xx datanode2-datacenterB
firstIndex 2 r STARTED 0 283b xx.xx.xx.xx datanode1-datacenterB
firstIndex 1 r STARTED 0 283b xx.xx.xx.xx datanode2-datacenterB
firstIndex 3 r UNASSIGNED
firstIndex 4 r UNASSIGNED
firstIndex 0 r UNASSIGNED
secondIndex 2 r STARTED 3375 2.8mb xx.xx.xx.xx datanode2-datacenterB
secondIndex 3 r STARTED 3416 2.2mb xx.xx.xx.xx datanode2-datacenterB
secondIndex 1 r STARTED 3411 3.2mb xx.xx.xx.xx datanode1-datacenterB
secondIndex 4 r UNASSIGNED
secondIndex 0 r STARTED 3512 2.9mb xx.xx.xx.xx datanode2-datacenterB
thirdIndex 2 r STARTED 4688 1.3mb xx.xx.xx.xx datanode2-datacenterB
thirdIndex 1 r STARTED 4745 1.4mb xx.xx.xx.xx datanode1-datacenterB
thirdIndex 4 r UNASSIGNED
thirdIndex 3 r UNASSIGNED
thirdIndex 0 r STARTED 4845 1.4mb xx.xx.xx.xx datanode1-datacenterB
Could anyone suggest what can I do in this situation? Im not sure If I should add more replicas, or maybe make the primary shards number exact as number of total nodes in cluster?
Thanks