Yellow Status- Unassigned Shards

antonopo · September 5, 2019, 2:30pm

Hi,

I have many indices with yellow status and Unassigned Shards. How i can fixed that?
Is this related that my nodes are on different countries and there is delay on network or it's something else?

Preakness · September 5, 2019, 2:37pm

Hello Antonopo,
There are many reason for the cluster health to be yellow:

A node could have rejoined the cluster
Running out of disk space
Cluster restart
etc

Use the explain API to see the reason shards are not being assigned.

curl -XGET localhost:9200/_cluster/allocation/explain?pretty

If you're running xpack security run it like this.

curl -u jacknik:password  -XGET https://localhost:9200/_cluster/allocation/explain?pretty

of course pass a username and password which is configured for your environment that has cluster management privileges.

antonopo · September 6, 2019, 6:22am

{
"index" : "metricbeat-7.0.1-2019.09.03",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2019-09-03T00:02:10.320Z",
"failed_allocation_attempts" : 5,
"details" : "failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "GE4A2v29Qlybs6FyKbyCMw",
"node_name" : "xh-fr-elastic-2",
"transport_address" : "135.238.239.132:9300",
"node_attributes" : {
"ml.machine_memory" : "269930708992",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "disk_threshold",
"decision" : "NO",
"explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [14.686823037524427%]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of outgoing shard recoveries [2] on the node [nTGxqS2hTNe2O_zh9Wx1tQ] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
},

antonopo · September 6, 2019, 6:23am

{
"node_id" : "TXQBZI3yRf69Q7CCJ2PdFQ",
"node_name" : "xh-it-elastic-1",
"transport_address" : "151.98.17.60:9300",
"node_attributes" : {
"ml.machine_memory" : "34359738368",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of outgoing shard recoveries [2] on the node [nTGxqS2hTNe2O_zh9Wx1tQ] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
}

antonopo · September 6, 2019, 6:23am

{
"node_id" : "V-WUk1ZeQ7yROHALYmndkQ",
"node_name" : "xh-gr-elastic-2",
"transport_address" : "10.159.166.9:9300",
"node_attributes" : {
"ml.machine_memory" : "269930721280",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
}

antonopo · September 6, 2019, 6:23am

{
"node_id" : "kZLf-LYfThiXOORoiECoaw",
"node_name" : "xh-gr-elastic-3",
"transport_address" : "10.158.67.107:9300",
"node_attributes" : {
"ml.machine_memory" : "17179332608",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of outgoing shard recoveries [2] on the node [nTGxqS2hTNe2O_zh9Wx1tQ] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
}

antonopo · September 6, 2019, 6:24am

{
"node_id" : "nTGxqS2hTNe2O_zh9Wx1tQ",
"node_name" : "xh-fr-elastic-1",
"transport_address" : "135.238.239.48:9300",
"node_attributes" : {
"ml.machine_memory" : "16654970880",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[metricbeat-7.0.1-2019.09.03][0], node[nTGxqS2hTNe2O_zh9Wx1tQ], [P], s[STARTED], a[id=H2Km3EpqTJSU8efmVwFQhw]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of outgoing shard recoveries [2] on the node [nTGxqS2hTNe2O_zh9Wx1tQ] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
}

antonopo · September 6, 2019, 6:24am

{
"node_id" : "rX7K8eJfQF6pRwt6i_oUpA",
"node_name" : "xh-gr-elastic-1",
"transport_address" : "10.158.67.175:9300",
"node_attributes" : {
"ml.machine_memory" : "16654884864",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of outgoing shard recoveries [2] on the node [nTGxqS2hTNe2O_zh9Wx1tQ] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
}

antonopo · September 6, 2019, 6:24am

{
"node_id" : "vgsBYEfqSF6WOumwnVXFyQ",
"node_name" : "xh-it-elastic-2",
"transport_address" : "151.98.17.34:9300",
"node_attributes" : {
"ml.machine_memory" : "8186552320",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:02:10.320Z], failed_attempts[5], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed to perform indices:data/write/bulk[s] on replica [metricbeat-7.0.1-2019.09.03][0], node[vgsBYEfqSF6WOumwnVXFyQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=6DLJqabRT02SxipKi25cFA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-09-03T00:01:29.171Z], failed_attempts[4], delayed=false, details[failed shard on node [vgsBYEfqSF6WOumwnVXFyQ]: failed recovery, failure RecoveryFailedException[[metricbeat-7.0.1-2019.09.03][0]: Recovery failed from {xh-fr-elastic-1}{nTGxqS2hTNe2O_zh9Wx1tQ}{Jn2CA99oTqKHokKMcwdzmw}{135.238.239.48}{135.238.239.48:9300}{dim}{ml.machine_memory=16654970880, ml.max_open_jobs=20, xpack.installed=true} into {xh-it-elastic-2}{vgsBYEfqSF6WOumwnVXFyQ}{yc1Cy7AsS-yY_0Vv8ETeFg}{151.98.17.34}{151.98.17.34:9300}{dim}{ml.machine_memory=8186552320, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[xh-fr-elastic-1][135.238.239.48:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][internal:index/shard/recovery/prepare_translog]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8234652702/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8234652232/7.6gb], new bytes reserved: [470/470b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=1814918/1.7mb, accounting=37715236/35.9mb]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[xh-it-elastic-2][151.98.17.34:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [8151435664/7.5gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8151430896/7.5gb], new bytes reserved: [4768/4.6kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=13384/13kb, accounting=39491168/37.6mb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
}
]
}

antonopo · September 6, 2019, 6:24am

I am getting all these

Preakness · September 6, 2019, 1:21pm

You're low on disk space, see this reason:

You'll need to remove data from Elasticsearch to get below 85% utilization then the shards will be assigned.

system · October 4, 2019, 1:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch cluster in Yellow state and 1 Unassigned Shard Elasticsearch	27	2122	October 2, 2020
Unassigned shards, v2 Elasticsearch	5	1344	July 6, 2017
Unassigned shards allocation Elasticsearch	6	1507	July 5, 2017
Elasticsearch continuously on yellow Status - Unassigned Shards Elasticsearch	2	240	April 12, 2023
Unassigned shards Elasticsearch	3	523	July 6, 2017

Yellow Status- Unassigned Shards

Related topics