Shard allocation says max retry but fails to allocate on retry_failed=true

Hey guys,

We have a cluster running on version 6.5.4.

When we look at the unassigned shards it shows the following shard for example:

{
  "index" : "facebook-post-comment_v1",
  "shard" : 3,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2019-02-25T11:04:27.016Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed shard on node [VR2ChTZKSWu1Do1NFQawWQ]: failed recovery, failure RecoveryFailedException[[facebook-post-comment_v1][3]: Recovery failed from {es73}{rzne2XGKQ0CCk_1I3ZGOUg}{mXiPfm07TWGgLlft4XYx0w}{192.168.1.73}{192.168.1.73:9300}{xpack.installed=true} into {es84}{VR2ChTZKSWu1Do1NFQawWQ}{qvBmNF6vRCOckUHZLt-Mxw}{192.168.1.84}{192.168.1.84:9300}{xpack.installed=true}]; nested: RemoteTransportException[[es73][192.168.1.73:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[2] phase2 failed]; nested: RemoteTransportException[[es84][192.168.1.84:9300][internal:index/shard/recovery/translog_ops]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [5965608074/5.5gb], which is larger than the limit of [5964143001/5.5gb], usages [request=0/0b, fielddata=5069497235/4.7gb, in_flight_requests=164784/160.9kb, accounting=895946055/854.4mb]]; ",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  ...
}

As much as I understand, it says that it failed to allocate for 5 times on Feb 25. So I thought I need to fix the problem and retry the allocation. The solution was to clear the cache, for getting rid of the CircuitBreakingException, for exceeding the limits. I did it on all nodes and there is no more CircuitBreakingExceptions on my nodes and when I check the breaker limits all nodes' parent breaker limits are below 1.2GB out of 5.5GB.

Then, I run the allocation command manually with the retry_failed=true but it shows the following.

POST /_cluster/reroute?retry_failed=true
{
  "commands" : [
  {
    "allocate_replica" : {
       "index" : "facebook-post-comment_v1", "shard" : 3,
       "node" : "es84"
     }
  }]
}

The response is

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[es74][192.168.1.74:9300][cluster:admin/reroute]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "[allocate_replica] allocation of [facebook-post-comment_v1][3] on node {es84}{VR2ChTZKSWu1Do1NFQawWQ}{ZJhk2hS2QGCl8j0bp6BRFw}{192.168.1.84}{192.168.1.84:9300}{xpack.installed=true} is not allowed, reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-02-25T11:04:27.016Z], failed_attempts[5], delayed=false, details[failed shard on node [VR2ChTZKSWu1Do1NFQawWQ]: failed recovery, failure RecoveryFailedException[[facebook-post-comment_v1][3]: Recovery failed from {es73}{rzne2XGKQ0CCk_1I3ZGOUg}{mXiPfm07TWGgLlft4XYx0w}{192.168.1.73}{192.168.1.73:9300}{xpack.installed=true} into {es84}{VR2ChTZKSWu1Do1NFQawWQ}{qvBmNF6vRCOckUHZLt-Mxw}{192.168.1.84}{192.168.1.84:9300}{xpack.installed=true}]; nested: RemoteTransportException[[es73][192.168.1.73:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[2] phase2 failed]; nested: RemoteTransportException[[es84][192.168.1.84:9300][internal:index/shard/recovery/translog_ops]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [5965608074/5.5gb], which is larger than the limit of [5964143001/5.5gb], usages [request=0/0b, fielddata=5069497235/4.7gb, in_flight_requests=164784/160.9kb, accounting=895946055/854.4mb]]; ], allocation_status[no_attempt]]])]..."
  },
  "status": 400
}

It says no on max_retry condition but shouldn't it be yes? Since it already reached the retry limit. Also, es75 node has the shard data already.

Please don't post images of text, it's impossible to search them or quote them in responses and they can't be read at all by those of us that use screenreaders. Use the </> button to format code-like text more neatly instead.

How many shards do you currently have in this cluster?

The retry counter is reset after trying to perform any commands. TIL. I opened an issue, and documented a workaround here:

Thank you for your reply. I will edit my post as soon as I have access to cluster.

Currently cluster has 915 shards in the cluster. There are 9 Data nodes with 16GB memory (Elasticsearch has 8GB memory).

How can I be sure that if counter is reset ? From your issue I understand that after executing the command, the counter is reset and Elasticsearch again hits the limit. But looking at the first image above, unassigned shard says it tried at 25 February which is 2 days before I executed the reroute commands. Shouldn't it be updated if it retried to allocate ?

I will try the workaround and keep here updated.

The simplest way to tell is that the shard is now allocated.

The message was generated before the counter was reset. If you tried again, I think you would have seen a more recent message.

Actually, I tried more than once to run this command and nothing updated at that response and shards are still unassigned.

Also, just in case, I still check breaker stats on nodes and I believe there is enough room. (Elasticsearch was complaining for CircuitBreakingException[parent] and when I check the parent breaker, at least 4GB available.)

I looked at this again and you're right, if you have an allocation blocked by the max_retries condition and try and run a reroute command then it will do nothing, even if you set ?retry_failed=true, so the message will remain unchanged. The workaround and fix remain the same.

Thanks, David. I simply disabled the allocation, reset the counter, re-enabled the allocation and all of them is assigned correctly.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.