Hi everyone,
I've recently been setting up a new ELK stack, and am naturally trying to use all the cool tools and features possible.
In times past we've used regular indices and curator to move their shards about based on custom node attributes.
For this stack I'm using data streams and setting up ILM to move things about based on the built-in hot/warm/cold node roles. It's not quite doing what we expect.
The stack has a bunch of master and coordinating nodes, 3 physical hot nodes, 3 physical warm nodes, and 2 physical cold nodes.
The physical nodes are very large. Even the rate of indexing the hot nodes are doing doesn't stress the CPU much at all. Warm and cold servers are basically idle at this time. Heap is all good.
_ilm/explain shows indices do move into the warm phase, which can be seen by ILM setting index.routing.allocation.include._tier_preference to "data_warm,data_hot" on the indices behind the data stream.
However the problem we have is that not all of the shards move. Indices have between 1 and 4 primary shards, and all have 1 replica.
_cluster/allocation/explain produces interesting output
{
"index" : ".ds-logs-nginx-xwing-2021.08.08-000022",
...
"current_node" : {
"id" : "81c4tUd9RJmWN8LcuINxHg",
"name" : "elk3-warm1",
"transport_address" : "172.16.17.34:9300",
"attributes" : {
"temperature" : "warm",
"xpack.installed" : "true",
"transform.node" : "false"
},
"weight_ranking" : 1
},
"can_remain_on_current_node" : "yes",
"can_rebalance_cluster" : "yes",
"can_rebalance_to_other_node" : "no",
"rebalance_explanation" : "cannot rebalance as no target node exists that can both allocate this shard and improve the cluster balance",
"node_allocation_decisions" : [
{
"node_id" : "I4eupHHOQKe8XM1FVuVuxg",
"node_name" : "elk3-hot3",
"transport_address" : "172.16.17.39:9300",
"node_attributes" : {
"temperature" : "hot",
"xpack.installed" : "true",
"transform.node" : "false"
},
"node_decision" : "no",
"weight_ranking" : 1,
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "a copy of this shard is already allocated to this node [[.ds-logs-nginx-xwing-2021.08.08-000022][1], node[I4eupHHOQKe8XM1FVuVuxg], [R], s[STARTED], a[id=RtrmtpHsS1a2Ey6UWARSog]]"
}
]
},
{
"node_id" : "-YBn47O-Q9uV2OWwbKwUhQ",
"node_name" : "elk3-hot1",
"transport_address" : "172.16.17.37:9300",
"node_attributes" : {
"temperature" : "hot",
"xpack.installed" : "true",
"transform.node" : "false"
},
"node_decision" : "worse_balance",
"weight_ranking" : 1
},
{
"node_id" : "BbQrlXIiT5-_2Yzr8iD4Hw",
"node_name" : "elk3-warm3",
"transport_address" : "172.16.17.36:9300",
"node_attributes" : {
"temperature" : "warm",
"xpack.installed" : "true",
"transform.node" : "false"
},
"node_decision" : "worse_balance",
"weight_ranking" : 1
},
... warm2 = worse_balance, weight_ranking=1 ...
... hot2 = worse_balance, weight_ranking=1 ...
... cold1 = worse_balance, weight_ranking=2 ...
... cold2 = worse_balance, weight_ranking=3 ...
]
}
The above is a, mildly abbreviated, example of _cluster/allocation/explain for shard 1 in an index with the following shards
.ds-logs-nginx-xwing-2021.08.08-000022 0 p STARTED 144804836 50gb 172.16.17.36 elk3-warm3
.ds-logs-nginx-xwing-2021.08.08-000022 0 r STARTED 144804836 50gb 172.16.17.34 elk3-warm1
.ds-logs-nginx-xwing-2021.08.08-000022 1 p STARTED 144821064 49.9gb 172.16.17.34 elk3-warm1
.ds-logs-nginx-xwing-2021.08.08-000022 1 r STARTED 144821064 49.9gb 172.16.17.39 elk3-hot3
.ds-logs-nginx-xwing-2021.08.08-000022 2 p STARTED 144809602 49.9gb 172.16.17.38 elk3-hot2
.ds-logs-nginx-xwing-2021.08.08-000022 2 r STARTED 144809602 49.9gb 172.16.17.35 elk3-warm2
.ds-logs-nginx-xwing-2021.08.08-000022 3 p STARTED 144818360 50gb 172.16.17.36 elk3-warm3
.ds-logs-nginx-xwing-2021.08.08-000022 3 r STARTED 144818360 50gb 172.16.17.37 elk3-hot1
The "worse_balance" decision appears to come from the fact that the warm servers have the same number of shards as the hot servers.
The cold servers have ~6 times as many shards as hot and warm, but they are tiny shards from indices full of junk we loaded at the very beginning and and will be deleted eventually. They were moved there by curator and index.routing.allocation.require.temperature = "cold".
The warm servers have more disk space than hot, and cold servers more than warm, so they should have more shards than the tier above.
My question to the collective is. What do I need to change to make the warm servers (and eventually the cold servers) not produce a worse balance in order for _tier_preference itself to have more "weight"?
Maybe ILM can require something, instead of just including? index.routing.allocation.require._tier_preference doesn't exist.
"Shard balancing heuristics settings" don't have any effect, because from a shards per node basis the cluster is balanced.
"Disk-based shard allocation settings" do have an effect. I'm specifically trying to keep different nodes in the cluster at different disk usage levels though, using time based ILM to move data about. The % free we're trying to keep the hot servers is much higher than the % free we're willing the warm servers to go to.
Thanks
Mike