Yellow Status- Unassigned Shards

Hi,

I have many indices with yellow status and Unassigned Shards. How i can fixed that?
Is this related that my nodes are on different countries and there is delay on network or it's something else?

Hello Antonopo,
There are many reason for the cluster health to be yellow:

  1. A node could have rejoined the cluster
  2. Running out of disk space
  3. Cluster restart
    etc

Use the explain API to see the reason shards are not being assigned.

curl -XGET localhost:9200/_cluster/allocation/explain?pretty

If you're running xpack security run it like this.

curl -u jacknik:password  -XGET https://localhost:9200/_cluster/allocation/explain?pretty

of course pass a username and password which is configured for your environment that has cluster management privileges.

{
"index" : "metricbeat-7.0.1-2019.09.03",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2019-09-03T00:02:10.320Z",
"failed_allocation_attempts" : 5,
"details" : "failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "GE4A2v29Qlybs6FyKbyCMw",
"node_name" : "xh-fr-elastic-2",
"transport_address" : "135.238.239.132:9300",
"node_attributes" : {
"ml.machine_memory" : "269930708992",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "disk_threshold",
"decision" : "NO",
"explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [14.686823037524427%]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of outgoing shard recoveries [2] on the node [nTGxqS2hTNe2O_zh9Wx1tQ] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
},

{
"node_id" : "TXQBZI3yRf69Q7CCJ2PdFQ",
"node_name" : "xh-it-elastic-1",
"transport_address" : "151.98.17.60:9300",
"node_attributes" : {
"ml.machine_memory" : "34359738368",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of outgoing shard recoveries [2] on the node [nTGxqS2hTNe2O_zh9Wx1tQ] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
}

{
"node_id" : "V-WUk1ZeQ7yROHALYmndkQ",
"node_name" : "xh-gr-elastic-2",
"transport_address" : "10.159.166.9:9300",
"node_attributes" : {
"ml.machine_memory" : "269930721280",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
}

{
"node_id" : "kZLf-LYfThiXOORoiECoaw",
"node_name" : "xh-gr-elastic-3",
"transport_address" : "10.158.67.107:9300",
"node_attributes" : {
"ml.machine_memory" : "17179332608",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of outgoing shard recoveries [2] on the node [nTGxqS2hTNe2O_zh9Wx1tQ] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
}

{
"node_id" : "nTGxqS2hTNe2O_zh9Wx1tQ",
"node_name" : "xh-fr-elastic-1",
"transport_address" : "135.238.239.48:9300",
"node_attributes" : {
"ml.machine_memory" : "16654970880",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[metricbeat-7.0.1-2019.09.03][0], node[nTGxqS2hTNe2O_zh9Wx1tQ], [P], s[STARTED], a[id=H2Km3EpqTJSU8efmVwFQhw]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of outgoing shard recoveries [2] on the node [nTGxqS2hTNe2O_zh9Wx1tQ] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
}

{
"node_id" : "rX7K8eJfQF6pRwt6i_oUpA",
"node_name" : "xh-gr-elastic-1",
"transport_address" : "10.158.67.175:9300",
"node_attributes" : {
"ml.machine_memory" : "16654884864",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of outgoing shard recoveries [2] on the node [nTGxqS2hTNe2O_zh9Wx1tQ] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
}

{
"node_id" : "vgsBYEfqSF6WOumwnVXFyQ",
"node_name" : "xh-it-elastic-2",
"transport_address" : "151.98.17.34:9300",
"node_attributes" : {
"ml.machine_memory" : "8186552320",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
}
]
}

I am getting all these

You're low on disk space, see this reason:

You'll need to remove data from Elasticsearch to get below 85% utilization then the shards will be assigned.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.