We have a 200 node cluster with multiple instances on same node. say host1-node1, host1-node2, host1-node3.
when we do a rolling upgrade. as we disable shard allocation. we don't have any issues with shards distribution.
but during hardware failures and especially when the node comes back up online close to the time new indexes are created at 6pm. I believe due to the shards imbalance all primary shards for the new indexes are created on the node which came backup online. This is causing huge load on the node and it is pushing logstash back.
Can you please advise on how to
- distribute new shards to be created on different nodes ( not on different instances. we need it be distributed to different physical node. as we have more than 20 indexes. even if 1 shard from each index is present on one instance of elastic on failed node. it adds up to 60 active indexing shards on one physical node which will be problem again)
below are my cluster settings. i just enabled same_shard host to true. does that apply for primaries too? i understand from documentation that it is for replicas.
//code {
"persistent" : {
"cluster" : {
"routing" : {
"allocation" : {
"enable" : "none"
}
}
}
},
"transient" : {
"cluster" : {
"routing" : {
"rebalance" : {
"enable" : "all"
},
"allocation" : {
"disk" : {
"watermark" : {
"low" : "90%",
"high" : "95%"
}
},
"node_initial_primaries_recoveries" : "25",
"awareness" : {
"attributes" : "zone"
},
"balance" : {
"index" : "0.55",
"shard" : "0.45"
},
"enable" : "all",
"same_shard" : {
"host" : "true"
},
"cluster_concurrent_rebalance" : "20",
"node_concurrent_recoveries" : "30"
}
}
},
"indices" : {
"store" : {
"throttle" : {
"type" : "merge"
}
}
},
"logger" : {
"_root" : "DEBUG"
}
}
}
Please let me know if you have any questions.