Shard Allocation Failures After 5 Retries

xianhechen · June 24, 2021, 5:38pm

Hi,

We have a 23 nodes cluster with 5 master nodes, 3 coordinator nodes, and 15 data nodes. Our index has a total of 30 primary shards and 3 replicas. Size of the index is around 800Gb. Earlier this week we found that there is 1 unassigned shard after we rebooted one of the nodes, and it failed to get allocated, here is the response from allocation explain API:

{
    "index" : "index_name",
    "shard" : 11,
    "primary" : false,
    "current_state" : "unassigned",
    "unassigned_info" : {
      "reason" : "ALLOCATION_FAILED",
      "at" : "2021-06-22T06:56:10.775Z",
      "failed_allocation_attempts" : 5,
      "details" : "failed shard on node [YQU4hZwQQVifqzeCJ4G0Dw]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[index_name][11]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ",
      "last_allocation_status" : "no_attempt"
    },
    "can_allocate" : "no",
    "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
    "node_allocation_decisions" : [
      {
        "node_id" : "2eM6R8OPQDek7BVJ7w72XA",
        "node_name" : "esd01",
        "transport_address" : "x.x.x.235:9300",
        "node_attributes" : {
          "xpack.installed" : "true"
        },
        "node_decision" : "no",
        "deciders" : [
          {
            "decider" : "max_retry",
            "decision" : "NO",
            "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-06-22T06:56:10.775Z], failed_attempts[5], failed_nodes[[YQU4hZwQQVifqzeCJ4G0Dw]], delayed=false, details[failed shard on node [YQU4hZwQQVifqzeCJ4G0Dw]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[index_name][11]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
          }
        ]
      }
      ...
      {
        "node_id" : "ySGiZ52BQwG8tGWJ4pcayA",
        "node_name" : "esd14",
        "transport_address" : "x.x.x.248:9300",
        "node_attributes" : {
          "xpack.installed" : "true"
        },
        "node_decision" : "no",
        "deciders" : [
          {
            "decider" : "max_retry",
            "decision" : "NO",
            "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-06-22T06:56:10.775Z], failed_attempts[5], failed_nodes[[YQU4hZwQQVifqzeCJ4G0Dw]], delayed=false, details[failed shard on node [YQU4hZwQQVifqzeCJ4G0Dw]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[index_name][11]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
          },
          {
            "decider" : "same_shard",
            "decision" : "NO",
            "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[index_name][11], node[ySGiZ52BQwG8tGWJ4pcayA], [R], s[STARTED], a[id=SmMfRLraRiS6Pfvy2IdxLA]]"
          }
        ]
      }
    ]
  }

we have also found lots of exceptions like this on our ElasticSearch Data nodes:

[2021-06-22T07:53:33,766][WARN ][o.e.c.a.s.ShardStateAction] [esd12] unexpected failure while sending request [internal:cluster/shard/failure] to [{esm01}{0NjBrIyQRc65BUfwzfjGow}{gs7hdoLFSY-eUQktHbIdBQ}{x.x.x.212}{x.x.x.212:9300}{m}{xpack.installed=true}] for shard entry [sh
    ard id [[index_name][4]], allocation id [dsxBh1jPSiaTvzZPRf24zA], primary term [1], message [failed to perform indices:data/write/bulk[s] on replica [index_name][4], node[OZb-Z6SQQnuEf6Djk4j-5w], [R], s[STARTED], a[id=dsxBh1jPSia
    TvzZPRf24zA]], failure [RemoteTransportException[[esd09][x.x.x.243:9300][indices:data/write/bulk[s][r]]]; nested: IllegalStateException[[index_name][4] operation primary term [1] is too old (current [2])]; ], markAsStale [true]]
    org.elasticsearch.transport.RemoteTransportException: [esm01][x.x.x.212:9300][internal:cluster/shard/failure]
    Caused by: org.elasticsearch.cluster.action.shard.ShardStateAction$NoLongerPrimaryShardException: primary term [1] did not match current primary term [2]
            at org.elasticsearch.cluster.action.shard.ShardStateAction$ShardFailedClusterStateTaskExecutor.execute(ShardStateAction.java:365) ~[elasticsearch-7.5.0.jar:7.5.0]
            at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702) ~[elasticsearch-7.5.0.jar:7.5.0]
            at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324) ~[elasticsearch-7.5.0.jar:7.5.0]
            at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219) ~[elasticsearch-7.5.0.jar:7.5.0]
            at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73) ~[elasticsearch-7.5.0.jar:7.5.0]
            at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) ~[elasticsearch-7.5.0.jar:7.5.0]
            at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-7.5.0.jar:7.5.0]
            at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-7.5.0.jar:7.5.0]
            at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) ~[elasticsearch-7.5.0.jar:7.5.0]
            at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) ~[elasticsearch-7.5.0.jar:7.5.0]
            at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) ~[elasticsearch-7.5.0.jar:7.5.0]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
            at java.lang.Thread.run(Thread.java:830) [?:?]

It would be appreciated if you can provide some inputs on possible causes.

HenningAndersen · June 25, 2021, 9:32am

The first problem can happen if the rebooted node was the master. It has been fixed in 7.13.0:

github.com/elastic/elasticsearch

Close search contexts on reassigned shard

elastic:master ← henningandersen:fix_unable_to_obtain_shard_lock

opened 06:16PM - 04 Feb 21 UTC

henningandersen

+140 -12

If a shard is reassigned to a node, but it has open searches (could be scrolls …even), the current behavior is to throw a ShardLockObtainFailedException. This commit changes the behavior to close the search contexts, likely failing some of the searches. The sentiment is to prefer restoring availability over trying to complete those searches. A situation where this can happen is when master(s) are restarted, which is likely to cause similar search issues anyway.

About the second issue, is that related to the first, i.e., same index and shard or is it something completely separate? Is it all for one index/shard and how many times did it occur? This can happen in edge cases and the likelihood is somewhat increased by multiple replicas though I do find it odd if it happens frequently.

xianhechen · June 28, 2021, 11:54pm

We have 1 index only, with 15 data nodes and 3 replicas. Index shards were set to 30. This issue only happened once for the past 6 months.

system · July 26, 2021, 11:54pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
An assigned shards -- max retries exceeded Elasticsearch	2	929	March 19, 2021
Constant Unassigned Shards Elasticsearch	8	678	July 8, 2021
Unassigned shards found Elasticsearch	2	5242	October 18, 2017
Elasticsearch unassigned shards CircuitBreakingException[[parent] Data too large Elasticsearch docker	1	746	November 27, 2020
Shard allocation says max retry but fails to allocate on retry_failed=true Elasticsearch	7	10984	April 2, 2019

Shard Allocation Failures After 5 Retries

Related topics