Hi,
We have a 23 nodes cluster with 5 master nodes, 3 coordinator nodes, and 15 data nodes. Our index has a total of 30 primary shards and 3 replicas. Size of the index is around 800Gb. Earlier this week we found that there is 1 unassigned shard after we rebooted one of the nodes, and it failed to get allocated, here is the response from allocation explain API:
{
"index" : "index_name",
"shard" : 11,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2021-06-22T06:56:10.775Z",
"failed_allocation_attempts" : 5,
"details" : "failed shard on node [YQU4hZwQQVifqzeCJ4G0Dw]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[index_name][11]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "2eM6R8OPQDek7BVJ7w72XA",
"node_name" : "esd01",
"transport_address" : "x.x.x.235:9300",
"node_attributes" : {
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-06-22T06:56:10.775Z], failed_attempts[5], failed_nodes[[YQU4hZwQQVifqzeCJ4G0Dw]], delayed=false, details[failed shard on node [YQU4hZwQQVifqzeCJ4G0Dw]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[index_name][11]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
}
]
}
...
{
"node_id" : "ySGiZ52BQwG8tGWJ4pcayA",
"node_name" : "esd14",
"transport_address" : "x.x.x.248:9300",
"node_attributes" : {
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-06-22T06:56:10.775Z], failed_attempts[5], failed_nodes[[YQU4hZwQQVifqzeCJ4G0Dw]], delayed=false, details[failed shard on node [YQU4hZwQQVifqzeCJ4G0Dw]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[index_name][11]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[index_name][11], node[ySGiZ52BQwG8tGWJ4pcayA], [R], s[STARTED], a[id=SmMfRLraRiS6Pfvy2IdxLA]]"
}
]
}
]
}
we have also found lots of exceptions like this on our ElasticSearch Data nodes:
[2021-06-22T07:53:33,766][WARN ][o.e.c.a.s.ShardStateAction] [esd12] unexpected failure while sending request [internal:cluster/shard/failure] to [{esm01}{0NjBrIyQRc65BUfwzfjGow}{gs7hdoLFSY-eUQktHbIdBQ}{x.x.x.212}{x.x.x.212:9300}{m}{xpack.installed=true}] for shard entry [sh
ard id [[index_name][4]], allocation id [dsxBh1jPSiaTvzZPRf24zA], primary term [1], message [failed to perform indices:data/write/bulk[s] on replica [index_name][4], node[OZb-Z6SQQnuEf6Djk4j-5w], [R], s[STARTED], a[id=dsxBh1jPSia
TvzZPRf24zA]], failure [RemoteTransportException[[esd09][x.x.x.243:9300][indices:data/write/bulk[s][r]]]; nested: IllegalStateException[[index_name][4] operation primary term [1] is too old (current [2])]; ], markAsStale [true]]
org.elasticsearch.transport.RemoteTransportException: [esm01][x.x.x.212:9300][internal:cluster/shard/failure]
Caused by: org.elasticsearch.cluster.action.shard.ShardStateAction$NoLongerPrimaryShardException: primary term [1] did not match current primary term [2]
at org.elasticsearch.cluster.action.shard.ShardStateAction$ShardFailedClusterStateTaskExecutor.execute(ShardStateAction.java:365) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) ~[elasticsearch-7.5.0.jar:7.5.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) ~[elasticsearch-7.5.0.jar:7.5.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
It would be appreciated if you can provide some inputs on possible causes.