New Primary Shards On Same Node

We have a 200 node cluster with multiple instances on same node. say host1-node1, host1-node2, host1-node3.

when we do a rolling upgrade. as we disable shard allocation. we don't have any issues with shards distribution.

but during hardware failures and especially when the node comes back up online close to the time new indexes are created at 6pm. I believe due to the shards imbalance all primary shards for the new indexes are created on the node which came backup online. This is causing huge load on the node and it is pushing logstash back.

Can you please advise on how to

  1. distribute new shards to be created on different nodes ( not on different instances. we need it be distributed to different physical node. as we have more than 20 indexes. even if 1 shard from each index is present on one instance of elastic on failed node. it adds up to 60 active indexing shards on one physical node which will be problem again)

below are my cluster settings. i just enabled same_shard host to true. does that apply for primaries too? i understand from documentation that it is for replicas.

//code {
"persistent" : {
"cluster" : {
"routing" : {
"allocation" : {
"enable" : "none"
}
}
}
},
"transient" : {
"cluster" : {
"routing" : {
"rebalance" : {
"enable" : "all"
},
"allocation" : {
"disk" : {
"watermark" : {
"low" : "90%",
"high" : "95%"
}
},
"node_initial_primaries_recoveries" : "25",
"awareness" : {
"attributes" : "zone"
},
"balance" : {
"index" : "0.55",
"shard" : "0.45"
},
"enable" : "all",
"same_shard" : {
"host" : "true"
},
"cluster_concurrent_rebalance" : "20",
"node_concurrent_recoveries" : "30"
}
}
},
"indices" : {
"store" : {
"throttle" : {
"type" : "merge"
}
}
},
"logger" : {
"_root" : "DEBUG"
}
}
}

Please let me know if you have any questions.

Hey Praveen,

You're able to manually rereoute shards to different nodes using the reroute API (https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-reroute.html) which will get you by in the immediate term.

However if you're repeatedly finding primary shards building up one node you can add some routing configuration that will limit the amount of shards (for a given index) that can end up on a single node: https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-total-shards.html

I also notice you have cluster.routing.allocation.enable set to none -- Is there a reason for this?

Cheers,
Mike

Hi Mike, Thank you very much for your response.

Yes, I manually moved shards to fix it temporarily.

When one of my physical node failed and came back after 8hrs. This happened again.

i saw this index.routing.allocation.total_shards_per_node but as we are running 3 instances on each node with 10 indexes around. i am afraid it will create 1 shard on each instance which will add to 30 actively indexing shards on one node.

cluster.routing.allocation.enable is set to all in transient settings. i will add that to persistent. as transient precedes persistent. i think that is good for now.

Thanks
Praveen

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.