Allocation Error


(Lars Riis Olsen) #1

Hi,

I have a 2 node cluster with 1 shard per node and no replication. The cluster is used for logging and uses daily indexes. Normally everything runs fine but the index that was created a few days ago turned red half way through the day and stopped accepting data.

If a do a /_cluster/allocation/explain i get the following error explanation:

cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy

Both nodes are otherwise running fine and accepting data on all other indices. Has anybody got an idea as to what is causing this and what I can do to fix it?


(Mark Walkom) #2

What version are you on? Can you show us the complete output from the allocation call?

That's not an ideal deployment, for many reasons.


(Lars Riis Olsen) #3

We are currently on 5.3.0. This is the full error message:

{
"index": "logstash-2017.05.02",
"shard": 1,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2017-05-08T19:14:45.122Z",
"failed_allocation_attempts": 5,
"details": "failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[logstash-2017.05.02][1]: obtaining shard lock timed out after 5000ms]; ",
"last_allocation_status": "no"
},
"can_allocate": "no",
"allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy",
"node_allocation_decisions": [
{
"node_id": "P06M7-VqSfuHCnz77thsjA",
"node_name": "es2-node-2",
"transport_address": "192.168.20.245:9300",
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "s4vzlt_zSFiGBV0VoyOzuA",
"node_name": "es2-node-1",
"transport_address": "192.168.23.233:9300",
"node_decision": "no",
"store": {
"in_sync": true,
"allocation_id": "jibPirrbQcSgmjLmFqLIzQ"
},
"deciders": [
{
"decider": "max_retry",
"decision": "NO",
"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2017-05-08T19:14:45.122Z], failed_attempts[5], delayed=false, details[failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[logstash-2017.05.02][1]: obtaining shard lock timed out after 5000ms]; ], allocation_status[deciders_no]]]"
}
]
}
]
}


(Mos Yang) #4

I have similar problem, too. My servers encountered HW crash and ES cluster has many unassigned shards.

GET /_cluster/allocation/explain
"details": "failed to create shard, failure FileSystemException[/data/elasticsearch/data/nodes/0/indices/35ByVBpBTdiFQ_TLgq4L2w/5/_state/state-1.st.tmp: Read-only file system]"

I use reroute API to fix those unassigned shards because I see the explanation as below

"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry

So, POST /_cluster/reroute?retry_failed=true may help you.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.