Hello,
I have Elastic Cluster with the following Configuration:
3 Master Nodes (JVM Heap Ram 4 GB, 1 CPU, 1 GB Disk)
10 Data/Ingest Nodes (JVM Heap Ram 8 GB, 2 CPU, 500 GB Disk)
~200 Indexes
Each Index has 2 Replicas
~600 Shards
Each Data Node has ~60 Shards
For the recent month I am observing constantly there are several (1-5) Shards unassigned, which keeps Cluster in Yellow status.
_cluster/allocation endpoint shows the following:
{
"index" : "prod-2021.06.09",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2021-06-09T11:13:10.137Z",
"failed_allocation_attempts" : 5,
"details" : "failed shard on node [6_nnUSedQiaOKpN72xsxiQ]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[prod-2021.06.09][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "3vFljpB3S6qRMivGz2wg1g",
"node_name" : "-prod-es-data-6",
"transport_address" : ":9300",
"node_attributes" : {
"ml.machine_memory" : "12884901888",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-06-09T11:13:10.137Z], failed_attempts[5], delayed=false, details[failed shard on node [6_nnUSedQiaOKpN72xsxiQ]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[prod-2021.06.09][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
}
]
},
{
"node_id" : "4dRmDv9NQDCXucdrOrH9mw",
"node_name" : "-prod-es-data-8",
"transport_address" : ":9300",
"node_attributes" : {
"ml.machine_memory" : "12884901888",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-06-09T11:13:10.137Z], failed_attempts[5], delayed=false, details[failed shard on node [6_nnUSedQiaOKpN72xsxiQ]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[prod-2021.06.09][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
}
]
},
{
"node_id" : "6_nnUSedQiaOKpN72xsxiQ",
"node_name" : "-prod-es-data-1",
"transport_address" : ":9300",
"node_attributes" : {
"ml.machine_memory" : "12884901888",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-06-09T11:13:10.137Z], failed_attempts[5], delayed=false, details[failed shard on node [6_nnUSedQiaOKpN72xsxiQ]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[prod-2021.06.09][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[prod-2021.06.09][0], node[6_nnUSedQiaOKpN72xsxiQ], [R], s[STARTED], a[id=B_6iBKySSF2U1PO-HESdNg]]"
}
]
},
{
"node_id" : "C2PBubtXQayDfNbV8SO3dA",
"node_name" : "-prod-es-data-3",
"transport_address" : ":9300",
"node_attributes" : {
"ml.machine_memory" : "12884901888",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-06-09T11:13:10.137Z], failed_attempts[5], delayed=false, details[failed shard on node [6_nnUSedQiaOKpN72xsxiQ]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[prod-2021.06.09][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
}
]
},
{
"node_id" : "KNapgBKETTOpKIiaKcs18Q",
"node_name" : "-prod-es-data-9",
"transport_address" : ":9300",
"node_attributes" : {
"ml.machine_memory" : "12884901888",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-06-09T11:13:10.137Z], failed_attempts[5], delayed=false, details[failed shard on node [6_nnUSedQiaOKpN72xsxiQ]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[prod-2021.06.09][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[prod-2021.06.09][0], node[KNapgBKETTOpKIiaKcs18Q], [P], s[STARTED], a[id=FBCUV0sRSo6KIzxBv1nqAw]]"
}
]
},
{
"node_id" : "NewSPDMKShyAr_Aa8iPs2Q",
"node_name" : "-prod-es-data-0",
"transport_address" : ":9300",
"node_attributes" : {
"ml.machine_memory" : "12884901888",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-06-09T11:13:10.137Z], failed_attempts[5], delayed=false, details[failed shard on node [6_nnUSedQiaOKpN72xsxiQ]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[prod-2021.06.09][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
}
]
},
{
"node_id" : "Om3F0aVgTaSL32ye7z477A",
"node_name" : "-prod-es-data-5",
"transport_address" : ":9300",
"node_attributes" : {
"ml.machine_memory" : "12884901888",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-06-09T11:13:10.137Z], failed_attempts[5], delayed=false, details[failed shard on node [6_nnUSedQiaOKpN72xsxiQ]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[prod-2021.06.09][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
}
]
},
{
"node_id" : "cT_alrWFTEWXpZVlclCeRA",
"node_name" : "-prod-es-data-7",
"transport_address" : ":9300",
"node_attributes" : {
"ml.machine_memory" : "12884901888",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-06-09T11:13:10.137Z], failed_attempts[5], delayed=false, details[failed shard on node [6_nnUSedQiaOKpN72xsxiQ]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[prod-2021.06.09][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
}
]
}
]
}
During the day both on Master and Data Nodes JVM Heap once or several times per hour has spikes, so JVM Heap Usage is ~90-100%, which also leads to unstable cluster behaviour.
Also such exceptions also appear while monitoring in Kibana
[parent] Data too large, data for [<http_request>] would be [8207642762/7.6gb], which is larger than the limit of [8143876915/7.5gb], real usage: [8207642456/7.6gb], new bytes reserved: [306/306b], usages [request=72/72b, fielddata=19223/18.7kb, in_flight_requests=70836/69.1kb, accounting=70341646/67mb]: [circuit_breaking_exception] [parent] Data too large, data for [<http_request>] would be [8207642762/7.6gb], which is larger than the limit of [8143876915/7.5gb], real usage: [8207642456/7.6gb], new bytes reserved: [306/306b], usages [request=72/72b, fielddata=19223/18.7kb, in_flight_requests=70836/69.1kb, accounting=70341646/67mb], with { bytes_wanted=8207642762 & bytes_limit=8143876915 & durability="PERMANENT" }: Check the Elasticsearch Monitoring cluster network connection or the load level of the nodes.
Could you please suggest
- What could cause such constant Shards unassignments?
- How # of Indexes, Shards, Replicas, # of Master and Data Nodes, Ram Size could have an influence on such behaviour?
- What could cause constant spikes in JVM Heap Usage during the night, taking into consideration that load is much less and more constant then during the day?
- What additional information can I provide?
Thank you, Sasha