Allocation Error

Hi,

I have a 2 node cluster with 1 shard per node and no replication. The cluster is used for logging and uses daily indexes. Normally everything runs fine but the index that was created a few days ago turned red half way through the day and stopped accepting data.

If a do a /_cluster/allocation/explain i get the following error explanation:

cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy

Both nodes are otherwise running fine and accepting data on all other indices. Has anybody got an idea as to what is causing this and what I can do to fix it?

What version are you on? Can you show us the complete output from the allocation call?

That's not an ideal deployment, for many reasons.

We are currently on 5.3.0. This is the full error message:

{
"index": "logstash-2017.05.02",
"shard": 1,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2017-05-08T19:14:45.122Z",
"failed_allocation_attempts": 5,
"details": "failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[logstash-2017.05.02][1]: obtaining shard lock timed out after 5000ms]; ",
"last_allocation_status": "no"
},
"can_allocate": "no",
"allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy",
"node_allocation_decisions": [
{
"node_id": "P06M7-VqSfuHCnz77thsjA",
"node_name": "es2-node-2",
"transport_address": "192.168.20.245:9300",
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "s4vzlt_zSFiGBV0VoyOzuA",
"node_name": "es2-node-1",
"transport_address": "192.168.23.233:9300",
"node_decision": "no",
"store": {
"in_sync": true,
"allocation_id": "jibPirrbQcSgmjLmFqLIzQ"
},
"deciders": [
{
"decider": "max_retry",
"decision": "NO",
"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2017-05-08T19:14:45.122Z], failed_attempts[5], delayed=false, details[failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[logstash-2017.05.02][1]: obtaining shard lock timed out after 5000ms]; ], allocation_status[deciders_no]]]"
}
]
}
]
}

I have similar problem, too. My servers encountered HW crash and ES cluster has many unassigned shards.

GET /_cluster/allocation/explain
"details": "failed to create shard, failure FileSystemException[/data/elasticsearch/data/nodes/0/indices/35ByVBpBTdiFQ_TLgq4L2w/5/_state/state-1.st.tmp: Read-only file system]"

I use reroute API to fix those unassigned shards because I see the explanation as below

"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry

So, POST /_cluster/reroute?retry_failed=true may help you.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.