ES 7.8 ILM shrink action does not allocate primaries

The ILM shrink action seems to be unreliable when trying to allocate primary shards to the same node because it stuck at this step check-shrink-allocation step with the message: Waiting for node [XYZ] to contain [3] shards, found [2], remaining [1].

Also, according to [the docs], it is supposed to allocated primary shards. (Shrink | Elasticsearch Guide [8.11] | Elastic):

The shrink action allocates all primary shards of the index to one node so it can call the Shrink API to shrink the index. After shrinking, it swaps aliases that point to the original index to the new shrunken index.

However, when I checked my node XYZ, the 2 shards that have been allocated to node XYZ are actually replica shards! So it is either the docs are not correct or ILM shrink action is not considering the right shards.

I cannot retry the ILM action either because the index has not encountered an error when running a Lifecycle Policy.

What is recommended to be done in this case? Should I manually reroute to allocate a shard to node XYZ?

I did some in depth investigation and looked at /_cluster/allocation/explain and looks like the issue stemmed from the below two things:

  • My ES cluster is set up in the following non-ideal way: I have physical hosts that contain multiple data nodes, i.e. one physical server contains multiple data nodes.
  • I've set cluster.routing.allocation.same_shard.host to true so a primary and its replica cannot be on the same host.

Now as per my understanding, a copy of every shard in the index must reside on the same node as a prerequisite of shrink action. And if I'm not mistaken, it doesn't matter whether it is replicas or primaries, as long as we have every shard on the same node.

So now what happened was: from my experience, because ES clusters are not optimised for the setup where multiple data nodes are run on the same physical host, there is a chance when a copy will try to move to a node residing on the same host where another node containing another copy also resides and it will get deined because of cluster.routing.allocation.shame_shard.host is set to true. Here's an example:

Before shrink action moves the shards around, the shard allocations look like this (3 primaries, each with 1 replica):

Host A, Node A1: p0
Host A, Node A2: p1
Host B, Node A1: r0
Host B, Node B1: p2
Host C, Node C1: r1
Host C, Node C2: r2

Now when shrink action runs, let's say it decides to move r1 and p2 to Node A1 so that Node A1 contains a copy of every shard, so:

Host A, Node A1: p0 --- stays as it is
Host B, Node B1: p2 --> Host A, Node A1 - move is allowed
Host C, Node C1: r1 --> Host A, Node A1 - move is not allowed because of Host A, Node A2: p1, i.e. same shard copy already exists on the same Host A

I also noticed that when ILM is stuck in this situation, it causes the cluster to go yellow because replica shards become unassigned. And doing retry_failed does not resolve the issue.

As a workaround, I rerouted p1 instead of r1 to Node A1 which then allowed the shrink action to move forward and then things came back to normal and cluster became green again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.