Failed Reroute unassigned index

GithubRyze · September 29, 2022, 8:41am

Elasticsearch Version

docker image 8.1.3

Installed Plugins

No response

Java Version

openJDK 8

OS Version

linux

Problem Description

index                               shard prirep state      node         unassigned.reason
.ds-ilm-history-5-2022.09.11-000003 0     r      UNASSIGNED              ALLOCATION_FAILED
.apm-custom-link                    0     r      UNASSIGNED              ALLOCATION_FAILED
.geoip_databases                    0     r      UNASSIGNED              ALLOCATION_FAILED
.ds-ilm-history-5-2022.07.13-000001 0     r      UNASSIGNED              ALLOCATION_FAILED
.kibana_task_manager_8.1.3_001      0     r      UNASSIGNED              ALLOCATION_FAILED
.kibana-event-log-8.1.3-000003      0     r      UNASSIGNED              ALLOCATION_FAILED
.apm-agent-configuration            0     r      UNASSIGNED              ALLOCATION_FAILED

i have many unasingne index, so i call reroute api , primary shards reroute succes, but above replicas index always exception.

post http://192.168.158.151:32600/_cluster/reroute?retry_failed=true
{
  "commands": [
    {
      "allocate_replica": {
        "index": ".ds-ilm-history-5-2022.09.11-000003",
        "shard": 0,
        "node": "es-cluster-2"
      }
    }
  ]
}

Steps to Reproduce

i don't know how to reproduce. so i explain the background of this issue, maybe not clearly.
we have 3 node es cluster and running in k8s cluster, k8s pv is NFS. we are customer want to test cluster HA , so close one NFS NETWORK.. at last es cluster status is red and have many primary and replicas index is unssingned.

Logs (if relevant)

failed shard on node [qoF9MIAyQWGOChqbRqZLgg]: failed recovery, failure org.elasticsearch.indices.recovery.RecoveryFailedException:
 [.ds-ilm-history-5-2022.09.11-000003][0]: Recovery failed from {es-cluster-1}{7ZbvmpPsRvWJk7z8fjbRMg}{KCeptjRCTdKUcZ8OUZpFMA}{10.131.192.189}
 {10.131.192.189:9300}{cdfhilmrstw}{ml.machine_memory=2147483648, xpack.installed=true, ml.max_jvm_size=536870912} into {es-cluster-0}
 {qoF9MIAyQWGOChqbRqZLgg}{qqRq39jhSzGgTFAtxJ_cVw}{10.131.249.74}{10.131.249.74:9300}{cdfhilmrstw}{xpack.installed=true, ml.machine_memory=2147483648, 
 ml.max_jvm_size=536870912}\n\tat org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryResponseHandler.handleException
 (PeerRecoveryTargetService.java:816)\n\tat org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException
 (TransportService.java:1349)\n\tat org.elasticsearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:397)\n\t
 at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:717)\n\t
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\t
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\t
 at java.lang.Thread.run(Thread.java:833)\nCaused by: org.elasticsearch.transport.RemoteTransportException: 
 [es-cluster-1][10.131.192.189:9300][internal:index/shard/recovery/start_recovery]\nCaused by: 
 org.elasticsearch.transport.RemoteTransportException: [es-cluster-0][10.131.249.74:9300][internal:index/shard/recovery/clean_files]\nCaused by:
  org.elasticsearch.common.util.concurrent.UncategorizedExecutionException: Failed execution\n\t
  at org.elasticsearch.common.util.concurrent.FutureUtils.rethrowExecutionException(FutureUtils.java:80)\n\t
  at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:72)\n\t
  at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListenerDirectly(ListenableFuture.java:112)\n\t
  at org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:100)\n\t
  at org.elasticsearch.common.util.concurrent.BaseFuture.setException(BaseFuture.java:149)\n\t
  at org.elasticsearch.common.util.concurrent.ListenableFuture.onFailure(ListenableFuture.java:147)\n\t
  at org.elasticsearch.action.ActionListener$Delegating.onFailure(ActionListener.java:66)\n\t
  at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:439)\n\t
  at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:480)\n\t
  at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:533)\n\t
  at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:522)\n\t
  at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:67)\n\t
  at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:287)\n\t
  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:776)\n\t
  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\t
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\t
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\t
  at java.lang.Thread.run(Thread.java:833)\nCaused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: 
  execution_exception: org.apache.lucene.store.LockObtainFailedException: Lock held by another program: 
  /usr/share/elasticsearch/data/es-cluster-0/indices/Y49YVNy1T-Cvs784GyR_zA/0/index/write.lock\n\t
  at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:257)\n\t
  at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:231)\n\t
  at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:53)\n\t
  at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:65)\n\t.
  .. 16 more\nCaused by: org.apache.lucene.store.LockObtainFailedException: 
  Lock held by another program: /usr/share/elasticsearch/data/es-cluster-0/indices/Y49YVNy1T-Cvs784GyR_zA/0/index/write.lock\n\t
  at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:117)\n\t
  at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:43)\n\t
  at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:44)\n\t
  at org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:106)\n\t
  at org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:106)\n\t
  at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:301)\n\t
  at org.elasticsearch.indices.recovery.MultiFileWriter.renameAllTempFiles(MultiFileWriter.java:236)\n\t
  at org.elasticsearch.indices.recovery.RecoveryTarget.lambda$cleanFiles$6(RecoveryTarget.java:485)\n\t
  at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:436)\n\t... 10 more\n",

Christian_Dahlqvist · September 29, 2022, 8:57am

What is the configuration of your nodes? Are all nodes master eligible as well as data nodes (default configuration)?

GithubRyze · September 29, 2022, 9:17am

we are a newer of es, so we not set configuration for es cluster, so default conf for us
GET /_cluster/settings?pretty

{
    "persistent": {
        "ingest": {
            "geoip": {
                "downloader": {
                    "enabled": "false"
                }
            }
        }
    },
    "transient": {}
}

system · October 27, 2022, 9:18am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
UNASSIGNED replicas after reroute allocate_stale_primary Elasticsearch	1	469	March 18, 2022
Unassigned Shard Elasticsearch	4	725	January 3, 2020
Unassigned ilm-history shard cluster in RED state Elasticsearch ilm-index-lifecycle-management	2	1498	September 28, 2020
Elasticseach failed shard allocation Elasticsearch	8	1436	May 28, 2021
Elasticsearch unassigned shards CircuitBreakingException[[parent] Data too large Elasticsearch docker	1	746	November 27, 2020