Hi, I am trying to find out if this is really the case: shrink operation via API or ILM is unusable or at least unreliable when when both these conditions apply:
cluster.routing.allocation.same_shard.host
is set totrue
. This setting disallows allocation of more then 1 copy of the shard to the same host.- Multiple data nodes run on the same host. We use one node per physical attached disk.
Shrink operation on an index requires that a copy of every shard of that index is moved to one node. Lets name it the target node. In a multinode setup a shard on another node in the same host as the target node can block the move because of same_shard setting, and ILM gets stuck.
Example:
3 data hosts, each with 4 nodes, so 12 node cluster total.
ilmtest1
index with 6 shards and 1 replica.
shrink10m
ILM policy tries to shrink the ilmtest1
index. ILM does the pre-checks, locks the index and tries to move the shards. Last log message from ILM:
moving index [ilmtest1] from [{"phase":"warm","action":"shrink","name":"set-single-node-allocation"}] to [{"phase":"warm","action":"shrink","name":"check-shrink-allocation"}] in policy [shrink10m]
ILM status for the index:
GET ilmtest1/_ilm/explain?human
{
"indices" : {
"ilmtest1" : {
...
"action" : "shrink",
"step" : "check-shrink-allocation",
"shrink_index_name" : "shrink-d8ut-ilmtest1",
"step_info" : {
"message" : "Waiting for node [yyt9lJfxTD66O13E6UZPxg] to contain [6] shards, found [3], remaining [3]",
"node_id" : "yyt9lJfxTD66O13E6UZPxg",
"shards_left_to_allocate" : 3,
"expected_shards" : 6
}
}
}
}
Which node is target for the shrink job:
GET ilmtest1/_settings?filter_path=ilmtest1.settings.index.routing.allocation.require
{
"ilmtest1" : {
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"require" : {
"_id" : "yyt9lJfxTD66O13E6UZPxg"
}
}
}
}
}
}
}
Nodes list:
10.135.133.95 0 dr - 7.17.1 m1-john-data2 yyt9lJfxTD66O13E6UZPxg
Shard list:
index shard prirep state docs store ip node
ilmtest1 0 p STARTED 50377 4.2mb 10.135.133.95 m1-john-data2 <- 0p on target node
ilmtest1 0 r STARTED 50377 4.2mb 10.135.156.149 m3-john-data1
ilmtest1 1 p STARTED 50380 4.2mb 10.135.104.36 m1-john-data3
ilmtest1 1 r STARTED 50380 4.2mb 10.135.133.95 m1-john-data2 <- 1r on target node
ilmtest1 2 p STARTED 49707 4.2mb 10.135.133.95 m1-john-data2 <- 2p on target node
ilmtest1 2 r STARTED 49707 4.1mb 10.135.104.36 m3-john-data3
ilmtest1 3 p STARTED 49659 4.2mb 10.135.104.36 m2-john-data3 <- 3p should move to target node !
ilmtest1 3 r STARTED 49659 4.2mb 10.135.133.95 m2-john-data2 <- 3r block 3p move via same_shard
ilmtest1 4 p STARTED 49857 4.2mb 10.135.156.149 m1-john-data1
ilmtest1 4 r STARTED 49857 4.3mb 10.135.133.95 m3-john-data2
ilmtest1 5 p STARTED 50020 4.2mb 10.135.133.95 m4-john-data2
ilmtest1 5 r STARTED 50020 4.2mb 10.135.156.149 m2-john-data1
Check 3p allocation status:
GET /_cluster/allocation/explain?pretty
{
"index": "ilmtest1",
"shard": 3,
"primary": true,
"current_node": "m2-john-data3"
}
{
"index" : "ilmtest1",
"shard" : 3,
"primary" : true,
"can_remain_on_current_node" : "no",
"can_move_to_other_node" : "no",
"move_explanation" : "cannot move shard to another node, even though it is not allowed to remain on its current node",
...
"node_allocation_decisions" : [
...
{
"node_id" : "yyt9lJfxTD66O13E6UZPxg",
"node_name" : "m1-john-data2",
"transport_address" : "10.135.133.95:9301",
"node_attributes" : {
"zone" : "do",
"xpack.installed" : "true",
"transform.node" : "false"
},
"node_decision" : "no",
"weight_ranking" : 11,
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "a copy of this shard is already allocated to host address [10.135.133.95], on node [yyt9lJfxTD66O13E6UZPxg], and [cluster.routing.allocation.same_shard.host] is [true] which forbids more than one node on this host from holding a copy of this shard"
}
]
}
]
}
And now ILM for index ilmtest1
is stuck. Any way to avoid this without manually moving replica shards away from the target node?