Shard allocation says max retry but fails to allocate on retry_failed=true

redcho · February 28, 2019, 6:23pm

Hey guys,

We have a cluster running on version 6.5.4.

When we look at the unassigned shards it shows the following shard for example:

{
  "index" : "facebook-post-comment_v1",
  "shard" : 3,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2019-02-25T11:04:27.016Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed shard on node [VR2ChTZKSWu1Do1NFQawWQ]: failed recovery, failure RecoveryFailedException[[facebook-post-comment_v1][3]: Recovery failed from {es73}{rzne2XGKQ0CCk_1I3ZGOUg}{mXiPfm07TWGgLlft4XYx0w}{192.168.1.73}{192.168.1.73:9300}{xpack.installed=true} into {es84}{VR2ChTZKSWu1Do1NFQawWQ}{qvBmNF6vRCOckUHZLt-Mxw}{192.168.1.84}{192.168.1.84:9300}{xpack.installed=true}]; nested: RemoteTransportException[[es73][192.168.1.73:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[2] phase2 failed]; nested: RemoteTransportException[[es84][192.168.1.84:9300][internal:index/shard/recovery/translog_ops]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [5965608074/5.5gb], which is larger than the limit of [5964143001/5.5gb], usages [request=0/0b, fielddata=5069497235/4.7gb, in_flight_requests=164784/160.9kb, accounting=895946055/854.4mb]]; ",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  ...
}

As much as I understand, it says that it failed to allocate for 5 times on Feb 25. So I thought I need to fix the problem and retry the allocation. The solution was to clear the cache, for getting rid of the CircuitBreakingException, for exceeding the limits. I did it on all nodes and there is no more CircuitBreakingExceptions on my nodes and when I check the breaker limits all nodes' parent breaker limits are below 1.2GB out of 5.5GB.

Then, I run the allocation command manually with the retry_failed=true but it shows the following.

POST /_cluster/reroute?retry_failed=true
{
  "commands" : [
  {
    "allocate_replica" : {
       "index" : "facebook-post-comment_v1", "shard" : 3,
       "node" : "es84"
     }
  }]
}

The response is

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[es74][192.168.1.74:9300][cluster:admin/reroute]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "[allocate_replica] allocation of [facebook-post-comment_v1][3] on node {es84}{VR2ChTZKSWu1Do1NFQawWQ}{ZJhk2hS2QGCl8j0bp6BRFw}{192.168.1.84}{192.168.1.84:9300}{xpack.installed=true} is not allowed, reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-02-25T11:04:27.016Z], failed_attempts[5], delayed=false, details[failed shard on node [VR2ChTZKSWu1Do1NFQawWQ]: failed recovery, failure RecoveryFailedException[[facebook-post-comment_v1][3]: Recovery failed from {es73}{rzne2XGKQ0CCk_1I3ZGOUg}{mXiPfm07TWGgLlft4XYx0w}{192.168.1.73}{192.168.1.73:9300}{xpack.installed=true} into {es84}{VR2ChTZKSWu1Do1NFQawWQ}{qvBmNF6vRCOckUHZLt-Mxw}{192.168.1.84}{192.168.1.84:9300}{xpack.installed=true}]; nested: RemoteTransportException[[es73][192.168.1.73:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[2] phase2 failed]; nested: RemoteTransportException[[es84][192.168.1.84:9300][internal:index/shard/recovery/translog_ops]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [5965608074/5.5gb], which is larger than the limit of [5964143001/5.5gb], usages [request=0/0b, fielddata=5069497235/4.7gb, in_flight_requests=164784/160.9kb, accounting=895946055/854.4mb]]; ], allocation_status[no_attempt]]])]..."
  },
  "status": 400
}

It says no on max_retry condition but shouldn't it be yes? Since it already reached the retry limit. Also, es75 node has the shard data already.

DavidTurner · March 1, 2019, 8:10am

Please don't post images of text, it's impossible to search them or quote them in responses and they can't be read at all by those of us that use screenreaders. Use the </> button to format code-like text more neatly instead.

How many shards do you currently have in this cluster?

The retry counter is reset after trying to perform any commands. TIL. I opened an issue, and documented a workaround here:

github.com/elastic/elasticsearch

Reset max_retries counter before executing routing commands

opened 08:09AM - 01 Mar 19 UTC

closed 07:26AM - 10 Jun 19 UTC

DaveCTurner

>bug good first issue help wanted :Distributed/Allocation

We respect allocation deciders, including the `MaxRetryAllocationDecider`, when …executing reroute commands. If you specify `?retry_failed=true` then the retry counter is reset, but not until after trying to execute the reroute commands. This means that if an allocation has repeatedly failed, but you want to take control and assign a shard to a particular node to work around the repeated failures, you cannot just do this: ``` POST /_cluster/reroute?retry_failed=true { "commands": [ { "allocate_replica": { "index": "blahblah", "shard": 2, "node": "node-4" } } ] } ``` ~The command is rejected by the `MaxRetryAllocationDecider`, then the retry counter is reset, then Elasticsearch tries in vain to allocate the shard to a node of its choosing, hitting the retry limit again.~ Correction: The command is rejected by the `MaxRetryAllocationDecider` which stops the whole reroute process. The workaround today is: - disable allocation - reset the retry counter with a bare `POST /_cluster/reroute?retry_failed=true` - execute the desired reroute command with another call to `POST /_cluster/reroute` - re-enable allocation I think we should reset the retry counter before executing the reroute commands here: https://github.com/elastic/elasticsearch/blob/4ca241b78a80f92d5443caf6c943243b166515f5/server/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java#L338-L348 cf https://discuss.elastic.co/t/shard-allocation-says-max-retry-but-fails-to-allocate-on-retry-failed-true/170396

redcho · March 2, 2019, 12:52pm

Thank you for your reply. I will edit my post as soon as I have access to cluster.

Currently cluster has 915 shards in the cluster. There are 9 Data nodes with 16GB memory (Elasticsearch has 8GB memory).

How can I be sure that if counter is reset ? From your issue I understand that after executing the command, the counter is reset and Elasticsearch again hits the limit. But looking at the first image above, unassigned shard says it tried at 25 February which is 2 days before I executed the reroute commands. Shouldn't it be updated if it retried to allocate ?

I will try the workaround and keep here updated.

DavidTurner · March 2, 2019, 1:55pm

The simplest way to tell is that the shard is now allocated.

The message was generated before the counter was reset. If you tried again, I think you would have seen a more recent message.

redcho · March 4, 2019, 8:13am

Actually, I tried more than once to run this command and nothing updated at that response and shards are still unassigned.

Also, just in case, I still check breaker stats on nodes and I believe there is enough room. (Elasticsearch was complaining for CircuitBreakingException[parent] and when I check the parent breaker, at least 4GB available.)

DavidTurner · March 4, 2019, 10:21am

I looked at this again and you're right, if you have an allocation blocked by the max_retries condition and try and run a reroute command then it will do nothing, even if you set ?retry_failed=true, so the message will remain unchanged. The workaround and fix remain the same.

redcho · March 5, 2019, 5:52am

Thanks, David. I simply disabled the allocation, reset the counter, re-enabled the allocation and all of them is assigned correctly.

system · April 2, 2019, 5:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard Allocation Failures After 5 Retries Elasticsearch	3	1441	July 26, 2021
Elasticsearch unassigned shards CircuitBreakingException[[parent] Data too large Elasticsearch docker	1	746	November 27, 2020
An assigned shards -- max retries exceeded Elasticsearch	2	930	March 19, 2021
Shard has exceeded the maximum number of retries Elasticsearch	13	11363	February 24, 2020
ES 6.2.4 - ALLOCATION_FAILED TranslogCorruptedException Elasticsearch	2	5996	August 31, 2018

Shard allocation says max retry but fails to allocate on retry_failed=true

Related topics