Why replica shard is not allocated

Aniket_Pant · December 29, 2020, 4:10am

I have 8 nodes of cluster 3 master node 3 data node and 2 coordinate node.Everyday i saw this Missing replica shards and manually i close those index and open and then refresh those index in kibana and my problem gets solve.Although no data node leaves the cluster then why it is happen

GET /_cluster/allocation/explain

{
  "index" : "log-wlb-sysmon-2020.12.29",
  "shard" : 1,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2020-12-29T01:39:59.630Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed shard on node [voj77bzkQe-Dgzz9qiVudA]: failed recovery, failure RecoveryFailedException[[log-wlb-sysmon-2020.12.29][1]: Recovery failed from {ed3}{2BRhL-iTSeWCIx2fRH1jlA}{o7arVIoJSH-QEW2PbLOTmQ}{ed3}{XX.XX.XX.XX:9300}{d}{xpack.installed=true, transform.node=false} into {ed2}{voj77bzkQe-Dgzz9qiVudA}{nHyE4sVaQBeF1hgs6QD0Xw}{ed2}{XX.XX.XX.XX:9300}{d}{xpack.installed=true, transform.node=false}]; nested: RemoteTransportException[[ed3][XX.XX.XX.XX:9300][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [internal:index/shard/recovery/start_recovery] would be [7357090166/6.8gb], which is larger than the limit of [7140383129/6.6gb], real usage: [7357087176/6.8gb], new bytes reserved: [2990/2.9kb], usages [request=0/0b, fielddata=2984808609/2.7gb, in_flight_requests=2990/2.9kb, model_inference=0/0b, accounting=240827968/229.6mb]]; ",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "2BRhL-iTSeWCIx2fRH1jlA",
      "node_name" : "ed3",
      "transport_address" : "XX.XX.XX.XX:9300",
      "node_attributes" : {
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-12-29T01:39:59.630Z], failed_attempts[5], failed_nodes[[voj77bzkQe-Dgzz9qiVudA, pytohdtxQ-ywNaRIFnrLaw]], delayed=false, details[failed shard on node [voj77bzkQe-Dgzz9qiVudA]: failed recovery, failure RecoveryFailedException[[log-wlb-sysmon-2020.12.29][1]: Recovery failed from {ed3}{2BRhL-iTSeWCIx2fRH1jlA}{o7arVIoJSH-QEW2PbLOTmQ}{ed3}{XX.XX.XX.XX:9300}{d}{xpack.installed=true, transform.node=false} into {ed2}{voj77bzkQe-Dgzz9qiVudA}{nHyE4sVaQBeF1hgs6QD0Xw}{ed2}{XX.XX.XX.XX:9300}{d}{xpack.installed=true, transform.node=false}]; nested: RemoteTransportException[[ed3][XX.XX.XX.XX:9300][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [internal:index/shard/recovery/start_recovery] would be [7357090166/6.8gb], which is larger than the limit of [7140383129/6.6gb], real usage: [7357087176/6.8gb], new bytes reserved: [2990/2.9kb], usages [request=0/0b, fielddata=2984808609/2.7gb, in_flight_requests=2990/2.9kb, model_inference=0/0b, accounting=240827968/229.6mb]]; ], allocation_status[no_attempt]]]"
        },
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[log-wlb-sysmon-2020.12.29][1], node[2BRhL-iTSeWCIx2fRH1jlA], [P], s[STARTED], a[id=YuD_poc8TZCq5nWjVoDZrw]]"
        }
      ]
    },
    {
      "node_id" : "pytohdtxQ-ywNaRIFnrLaw",
      "node_name" : "ed1",
      "transport_address" : "XX.XX.XX.XX:9300",
      "node_attributes" : {
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-12-29T01:39:59.630Z], failed_attempts[5], failed_nodes[[voj77bzkQe-Dgzz9qiVudA, pytohdtxQ-ywNaRIFnrLaw]], delayed=false, details[failed shard on node [voj77bzkQe-Dgzz9qiVudA]: failed recovery, failure RecoveryFailedException[[log-wlb-sysmon-2020.12.29][1]: Recovery failed from {ed3}{2BRhL-iTSeWCIx2fRH1jlA}{o7arVIoJSH-QEW2PbLOTmQ}{ed3}{XX.XX.XX.XX:9300}{d}{xpack.installed=true, transform.node=false} into {ed2}{voj77bzkQe-Dgzz9qiVudA}{nHyE4sVaQBeF1hgs6QD0Xw}{ed2}{XX.XX.XX.XX:9300}{d}{xpack.installed=true, transform.node=false}]; nested: RemoteTransportException[[ed3][XX.XX.XX.XX:9300][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [internal:index/shard/recovery/start_recovery] would be [7357090166/6.8gb], which is larger than the limit of [7140383129/6.6gb], real usage: [7357087176/6.8gb], new bytes reserved: [2990/2.9kb], usages [request=0/0b, fielddata=2984808609/2.7gb, in_flight_requests=2990/2.9kb, model_inference=0/0b, accounting=240827968/229.6mb]]; ], allocation_status[no_attempt]]]"
        }
      ]
    },
    {
      "node_id" : "voj77bzkQe-Dgzz9qiVudA",
      "node_name" : "ed2",
      "transport_address" : "XX.XX.XX.XX:9300",
      "node_attributes" : {
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-12-29T01:39:59.630Z], failed_attempts[5], failed_nodes[[voj77bzkQe-Dgzz9qiVudA, pytohdtxQ-ywNaRIFnrLaw]], delayed=false, details[failed shard on node [voj77bzkQe-Dgzz9qiVudA]: failed recovery, failure RecoveryFailedException[[log-wlb-sysmon-2020.12.29][1]: Recovery failed from {ed3}{2BRhL-iTSeWCIx2fRH1jlA}{o7arVIoJSH-QEW2PbLOTmQ}{ed3}{XX.XX.XX.XX:9300}{d}{xpack.installed=true, transform.node=false} into {ed2}{voj77bzkQe-Dgzz9qiVudA}{nHyE4sVaQBeF1hgs6QD0Xw}{ed2}{XX.XX.XX.XX:9300}{d}{xpack.installed=true, transform.node=false}]; nested: RemoteTransportException[[ed3][XX.XX.XX.XX:9300][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [internal:index/shard/recovery/start_recovery] would be [7357090166/6.8gb], which is larger than the limit of [7140383129/6.6gb], real usage: [7357087176/6.8gb], new bytes reserved: [2990/2.9kb], usages [request=0/0b, fielddata=2984808609/2.7gb, in_flight_requests=2990/2.9kb, model_inference=0/0b, accounting=240827968/229.6mb]]; ], allocation_status[no_attempt]]]"
        }
      ]
    }
  ]
}

warkolm · December 29, 2020, 5:45am

Aniket_Pant:

    "details" : "failed shard on node [voj77bzkQe-Dgzz9qiVudA]: failed recovery, failure RecoveryFailedException[[log-wlb-sysmon-2020.12.29][1]: Recovery failed from {ed3}{2BRhL-iTSeWCIx2fRH1jlA}{o7arVIoJSH-QEW2PbLOTmQ}{ed3}{XX.XX.XX.XX:9300}{d}{xpack.installed=true, transform.node=false} into {ed2}{voj77bzkQe-Dgzz9qiVudA}{nHyE4sVaQBeF1hgs6QD0Xw}{ed2}{XX.XX.XX.XX:9300}{d}{xpack.installed=true, transform.node=false}]; nested: RemoteTransportException[[ed3][XX.XX.XX.XX:9300][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [internal:index/shard/recovery/start_recovery] would be [7357090166/6.8gb], which is larger than the limit of [7140383129/6.6gb], real usage: [7357087176/6.8gb], new bytes reserved: [2990/2.9kb], usages [request=0/0b, fielddata=2984808609/2.7gb, in_flight_requests=2990/2.9kb, model_inference=0/0b, accounting=240827968/229.6mb]]; ",

I've never seen a circuit breaker for a recovery action!

Part of the issue seems to be that you have ~2400 shards across 3 nodes, which is pretty excessive. You should look to reduce that.

Aniket_Pant · December 29, 2020, 6:41am

I have 11 indices each index has 3 shards and one replica from beginning i have created this but after one month i am facing this problem but we have space to allocate why it is not doing

Christian_Dahlqvist · December 29, 2020, 8:06am

Given the data volume in your cluster that is excessive. It is generally recommended to aim for a shard size measured in tens of GB and your average shard size is under 300MB, which is very, very small.

How long are you looking to retain data in your cluster? Is the shard count expected to grow?

DavidTurner · December 29, 2020, 9:04am

It's not all that unusual, but recovery isn't a big memory consumer so we normally only see these if, as here, the cluster is already right on the edge for unrelated reasons (e.g. too many shards). Since 7.8.0 it isn't immediately fatal to recoveries any more, we added retries here:

github.com/elastic/elasticsearch

Retry failed peer recovery due to transient errors

elastic:master ← Tim-Brooks:retry_peer_recovery_failures_due_to_overload

opened 09:22PM - 16 Apr 20 UTC

Tim-Brooks

+712 -56

Currently a failed peer recovery action will fail an recovery. This includes wh…en the recovery fails due to potentially short lived transient issues such as rejected exceptions or circuit breaking errors. This commit adds the concept of a retryable action. A retryable action will be retryed in face of certain errors. The action will be retried after an exponentially increasing backoff period. After defined time, the action will timeout. This commit only implements retries for responses that indicate the target node has NOT executed the action.

Since this cluster is running 7.10.1 it means that we already did a bunch of retries and gave up because they all failed for similar reasons. The fix is, as mentioned above, to substantially reduce the shard count.

Aniket_Pant · December 29, 2020, 9:32am

how did you determine shard size is 300mb as i said earlier i have 11 index and each index have different storage size like log-wlb-sysmon gets 10gb log per day and other index have 1 to 10mb.

Aniket_Pant · December 29, 2020, 9:35am

The storage is not get full if it reaches to 80%(Total storage) we trigger curator.

Christian_Dahlqvist · December 29, 2020, 9:48am

Your stats above indicated 615GB data across 2358 shards, which is an average of 261MB.

If you have one index that is much larger than the others do not use the same settings across the board. For the smaller indices it would make sense to just have a single primary shard and go away from using daily indices.

The best way to do this is generally to use rollover with ILM. That way you can specify max size and age of indices and make sure you get larger shards as each index may cover a longer time period.

Aniket_Pant · December 29, 2020, 10:27am

shard size=Total Shards/storage consumed (615/2358)
Sorry @Christian_Dahlqvist i am just new and i don't know how to calculate all these things like shard size

Aniket_Pant · December 29, 2020, 10:44am

i uses time based indices each day new index is created.Before using ILM Policy in my production i need to ask some question

if my one index size is of 122gb and i have 3 shards with one replicas, Is this is good
other index is in few mb in size like 600mb 7 mb for this do i have to create one primary shard

Christian_Dahlqvist · December 29, 2020, 10:48am

Sounds reasonable as it is 122GB across 6 shards.

If you intend to keep your data in the cluster for an extended period of time I would recommend going down to a single primary shard but also switch to weekly or monthly indices (depends on your retention period).

Aniket_Pant · December 29, 2020, 11:00am

Why it is happened i mean why replica shard is missing or not allocated.

Aniket_Pant · December 31, 2020, 10:12am

I am planning to reduce the number of shards but i don't know what procedure i have to follow for 2358 shards

Christian_Dahlqvist · December 31, 2020, 10:26am

Do the following:

Look at each index pattern and change index templates to have the appropriate number of primary shards (1 for small indices).
Change you indexing so you switch to weekly or even monthly indices where appropriate.

This will give you a much better sharding setup going forward and will stop too many new shards being generated.

If you have a fixed retention period and this is relatively short you can choose to do nothing more and just wait for indices with small shards to be deleted as they age. This will minimize the amount of work required and will reduce the shard count over time.

If you however plan to keep data long and the indices with lots of very small shards are not likely to be deleted anytime soon you may need to:

Use the shrink index API to reduce the number of primary shards to 1 for small indices.
If you need to reduce shard count more you may need to use the reindex API to reindex small daily indices into larger monthly indices (with 1 primary shard) and then delete the old indices.

Aniket_Pant · January 1, 2021, 6:40pm

We are planning to use shrink api for reducing shard size we want changes in our old indices because we have some dashboard that use old indices data and if we use new indices we have to make changes in our dashboard also

Christian_Dahlqvist · January 1, 2021, 6:42pm

You should not need to make any changes in dashboards as long as the names of the new indices match the same index patterns.

Aniket_Pant · January 17, 2021, 8:30am

hey @Christian_Dahlqvist is it possible to shrink all index like

POST log-pb-flow-*/_shrink/log-pb-flow_1
{
  "settings": {
    "index.number_of_replicas": 0,
    "index.number_of_shards": 1, 
    "index.codec": "best_compression" 
  }
}

because i have 10 index of this index

health status index                  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   log-pb-flow-2021.01.09 4wRBPWSbQ9m3KqRJ9D_c4A   6   0      58303            0     24.9mb         24.9mb
green  open   log-pb-flow-2021.01.08 syGfM1DHT86msmEPpfzK3g   6   0     131352            0     55.4mb         55.4mb
green  open   log-pb-flow-2021.01.07 aHMRR_qORgeKe4bptznBuw   6   0      19285            0      9.2mb          9.2mb
green  open   log-pb-flow-2021.01.13 bdsxYn4SSra89PQt9oIAtg   6   0       6497            0      4.5mb          4.5mb
green  open   log-pb-flow-2021.01.12 aL9zPpjCRQ6vnt3Ae_3ybg   6   0     134114            0     54.8mb         54.8mb
green  open   log-pb-flow-2021.01.11 8p1ownv-Su-bkKVtSp2syA   6   0      23369            0      9.2mb          9.2mb
green  open   log-pb-flow-2021.01.10 6aZYY-DfTOa445m8bnpMuw   6   0      20699            0      9.2mb          9.2mb
green  open   log-pb-flow-2021.01.16 GtdWFzrVQOa-5gYXqbwf-w   6   0       5333            0      3.3mb          3.3mb
green  open   log-pb-flow-2021.01.15 _kOms79nQ56Q5cuCymTDiw   6   0      74968            0     31.6mb         31.6mb
green  open   log-pb-flow-2021.01.14 47MggH4bRhqH1civvkZYbQ   6   0     146539            0     48.8mb         48.8mb

system · February 14, 2021, 8:30am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard Allocation Failures After 5 Retries Elasticsearch	3	1414	July 26, 2021
Shards are in ALLOCATION_FAILED or CLUSTER_RECOVERED Elasticsearch elastic-stack-monitoring	4	1806	August 21, 2023
Elasticsearch Shard Allocation - ALLOCATION_FAILED due to apparent disk quota issues, nowhere near max Elasticsearch docker	2	1332	August 18, 2020
Allocation Error Elasticsearch	4	12992	June 19, 2017
Shard allocation says max retry but fails to allocate on retry_failed=true Elasticsearch	7	10874	April 2, 2019

Why replica shard is not allocated

Related topics