Elasticsearch Version
docker image 8.1.3
Installed Plugins
No response
Java Version
openJDK 8
OS Version
linux
Problem Description
index shard prirep state node unassigned.reason
.ds-ilm-history-5-2022.09.11-000003 0 r UNASSIGNED ALLOCATION_FAILED
.apm-custom-link 0 r UNASSIGNED ALLOCATION_FAILED
.geoip_databases 0 r UNASSIGNED ALLOCATION_FAILED
.ds-ilm-history-5-2022.07.13-000001 0 r UNASSIGNED ALLOCATION_FAILED
.kibana_task_manager_8.1.3_001 0 r UNASSIGNED ALLOCATION_FAILED
.kibana-event-log-8.1.3-000003 0 r UNASSIGNED ALLOCATION_FAILED
.apm-agent-configuration 0 r UNASSIGNED ALLOCATION_FAILED
i have many unasingne index, so i call reroute api , primary shards reroute succes, but above replicas index always exception.
post http://192.168.158.151:32600/_cluster/reroute?retry_failed=true
{
"commands": [
{
"allocate_replica": {
"index": ".ds-ilm-history-5-2022.09.11-000003",
"shard": 0,
"node": "es-cluster-2"
}
}
]
}
Steps to Reproduce
i don't know how to reproduce. so i explain the background of this issue, maybe not clearly.
we have 3 node es cluster and running in k8s cluster, k8s pv is NFS. we are customer want to test cluster HA , so close one NFS NETWORK.. at last es cluster status is red and have many primary and replicas index is unssingned.
Logs (if relevant)
failed shard on node [qoF9MIAyQWGOChqbRqZLgg]: failed recovery, failure org.elasticsearch.indices.recovery.RecoveryFailedException:
[.ds-ilm-history-5-2022.09.11-000003][0]: Recovery failed from {es-cluster-1}{7ZbvmpPsRvWJk7z8fjbRMg}{KCeptjRCTdKUcZ8OUZpFMA}{10.131.192.189}
{10.131.192.189:9300}{cdfhilmrstw}{ml.machine_memory=2147483648, xpack.installed=true, ml.max_jvm_size=536870912} into {es-cluster-0}
{qoF9MIAyQWGOChqbRqZLgg}{qqRq39jhSzGgTFAtxJ_cVw}{10.131.249.74}{10.131.249.74:9300}{cdfhilmrstw}{xpack.installed=true, ml.machine_memory=2147483648,
ml.max_jvm_size=536870912}\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryResponseHandler.handleException
(PeerRecoveryTargetService.java:816)\n\tat org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException
(TransportService.java:1349)\n\tat org.elasticsearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:397)\n\t
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:717)\n\t
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\t
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\t
at java.lang.Thread.run(Thread.java:833)\nCaused by: org.elasticsearch.transport.RemoteTransportException:
[es-cluster-1][10.131.192.189:9300][internal:index/shard/recovery/start_recovery]\nCaused by:
org.elasticsearch.transport.RemoteTransportException: [es-cluster-0][10.131.249.74:9300][internal:index/shard/recovery/clean_files]\nCaused by:
org.elasticsearch.common.util.concurrent.UncategorizedExecutionException: Failed execution\n\t
at org.elasticsearch.common.util.concurrent.FutureUtils.rethrowExecutionException(FutureUtils.java:80)\n\t
at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:72)\n\t
at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListenerDirectly(ListenableFuture.java:112)\n\t
at org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:100)\n\t
at org.elasticsearch.common.util.concurrent.BaseFuture.setException(BaseFuture.java:149)\n\t
at org.elasticsearch.common.util.concurrent.ListenableFuture.onFailure(ListenableFuture.java:147)\n\t
at org.elasticsearch.action.ActionListener$Delegating.onFailure(ActionListener.java:66)\n\t
at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:439)\n\t
at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:480)\n\t
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:533)\n\t
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:522)\n\t
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:67)\n\t
at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:287)\n\t
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:776)\n\t
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\t
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\t
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\t
at java.lang.Thread.run(Thread.java:833)\nCaused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper:
execution_exception: org.apache.lucene.store.LockObtainFailedException: Lock held by another program:
/usr/share/elasticsearch/data/es-cluster-0/indices/Y49YVNy1T-Cvs784GyR_zA/0/index/write.lock\n\t
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:257)\n\t
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:231)\n\t
at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:53)\n\t
at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:65)\n\t.
.. 16 more\nCaused by: org.apache.lucene.store.LockObtainFailedException:
Lock held by another program: /usr/share/elasticsearch/data/es-cluster-0/indices/Y49YVNy1T-Cvs784GyR_zA/0/index/write.lock\n\t
at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:117)\n\t
at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:43)\n\t
at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:44)\n\t
at org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:106)\n\t
at org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:106)\n\t
at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:301)\n\t
at org.elasticsearch.indices.recovery.MultiFileWriter.renameAllTempFiles(MultiFileWriter.java:236)\n\t
at org.elasticsearch.indices.recovery.RecoveryTarget.lambda$cleanFiles$6(RecoveryTarget.java:485)\n\t
at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:436)\n\t... 10 more\n",