Shard initialization stuck on Cluster recovering - SameShardAllocationDecider

Hi,

I’m using Elasticsearch 5.4, and have a cluster initialization problem

It seems that shards do not manage to finish their initialization for more than 3 days.
I keep getting the following exception in the logs for ALL unassigned shards:

[2019-03-28T00:05:51,180][TRACE][o.e.c.r.a.d.AllocationDeciders] [esmaster2] Can not allocate [[test-index-20190217][2], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=CLUSTER_RECOVERED], at[2019-03-25T14:31:27.494Z], delayed=false, allocation_status[no_attempt]]] on node [{es1}{iNzYU8rGTLeSFwRuyX_SSg}{EBkmEu77RlGZldrXjRFK5g}{192.168.0.30}{192.168.0.30:9300}{rack=1}] due to [SameShardAllocationDecider]
[2019-03-28T00:05:51,180][TRACE][o.e.g.GatewayAllocator$InternalReplicaShardAllocator] [esmaster2] [test-index-20190217][2], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=CLUSTER_RECOVERED], at[2019-03-25T14:31:27.494Z], delayed=false, allocation_status[no_attempt]]: ignoring allocation, can't be allocated on any node

Restarting the cluster several times did not help,
I continued to get the same log, and the following results for Allocation explain API:

  • same_shard decider
  • 'reached the limit of incoming shard recoveries [100]'
  • cluster_rebalance
  • 'shard is in the process of initializing on node' But it started 3 days ago

On Recovery API and Index recovery API, I see some of unassigned shards are missing, and some stuck on init/ index status for 3 days.

Other steps I did in order to fix the cluster health status with no success:

  • To force the cluster to replicate the shards, I lower the number of replicates from 1 -> 0, and then (when they are relocated properly) back to 1. But still get the same log as above while replica added.
  • Delete and create replica does not help since it gets stuck on old running/stuck recovery tasks
  • Cancel those cluster tasks did not help since they are not cancellable..

I drilled down the code which is writing this log and I don’t understand why it happened here,
Because in our case the 'cluster.routing.allocation.same_shard.host param' is defined as false, this check is not needed, and it should not fail on this issue

SameShardAllocationDecider:
The OR condition in line 74 seems like a bug:

if (decision.type() == Decision.Type.NO || sameHost == false)

The SameShardAllocationDecider is there to ensure that Elasticsearch does not allocate more than one copy of any shard to a single node. It also optionally checks to make sure that there is not more than one copy of any shard on a single host, but this check is disabled if cluster.routing.allocation.same_shard.host is false.

The best thing to do if you have unassigned shards is to look at the allocation explain API.

  1. I just copy the Allocation explain API results from question above:
  • same_shard decider
  • 'reached the limit of incoming shard recoveries [100]'
  • cluster_rebalance
  • 'shard is in the process of initializing on node' But it started 7 days ago
  1. What is the different between the regular check and the optional one ?

I see, sorry, you have only shared a few parts to the result so I didn't notice it. It's normally a good idea to share the whole output.

100 is far too high. The default for this setting is 2 and that is a reasonable number. There is an open issue about problems related to setting this value too high.

The regular check ensures that there is only one copy of each shard on each node. The optional one ensures that there is only one copy of each shard on each host. They are different in the case where there are multiple nodes running on a single host.

Thanks,
Changing the shard recoveries to default setting (2) fixed the health status

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.