When excluding certain nodes with cluster.routing.allocation.exclude._ip
setting most of the shards are moved off of the nodes.
However, some nodes are still having a few shards. First 2 nodes here have 1 shard each even though they are excluded from allocation:
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
1 13gb 78.1gb 3.3tb 3.4tb 2 10.0.54.39 10.0.54.39 es-data-i-06a6ccfe35e55a373
1 13.1gb 74.1gb 3.3tb 3.4tb 2 10.0.40.130 10.0.40.130 es-data-i-05964a0d46869f1a0
123 1.4tb 1.5tb 1.9tb 3.4tb 44 10.0.53.110 10.0.53.110 es-data-i-03cf7c9c7ef35d91b
123 1.3tb 1.3tb 2tb 3.4tb 39 10.0.41.37 10.0.41.37 es-data-i-0cf78468318fbd107
Investigation with allocation explain API call reveals that those shards indeed should not be on those nodes:
"can_remain_on_current_node": "no",
"can_remain_decisions": [
{
"decider": "filter",
"decision": "NO",
"explanation": """node matches cluster setting [cluster.routing.allocation.exclude] filters [_ip:"10.0.35.3 OR 10.0.43.177 OR 10.0.40.130 OR 10.0.45.193 OR 10.0.43.124 OR 10.0.42.231 OR 10.0.42.179 OR 10.0.46.56 OR 10.0.52.223 OR 10.0.51.26 OR 10.0.50.74 OR 10.0.55.224 OR 10.0.52.197 OR 10.0.54.39 OR 10.0.53.177 OR 10.0.44.189 OR 10.0.32.136 OR 10.0.38.232 OR 10.0.32.108 OR 10.0.37.223 OR 10.0.34.143 OR 10.0.34.197 OR 10.0.33.133 OR 10.0.36.22 OR 10.0.48.18"]"""
}
]
However, those shards can't be moved either, as error says max retries reached:
"node_decision": "no",
"weight_ranking": 3,
"deciders": [
{
"decider": "max_retry",
"decision": "NO",
"explanation": "shard has exceeded the maximum number of retries [10] on failed relocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [failed_attempts[10]]"
}
]
Upon forcing relocation with POST /_cluster/reroute?retry_failed=true
Elasticsearch does attempt to move the shards, but fails to do so after 10 attempts spewing this in the logs:
Caused by: org.elasticsearch.env.ShardLockObtainFailedException: [usersearch_v23_54_production_users][18]: obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [6068348ms]
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:987) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:887) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.index.IndexService.createShard(IndexService.java:429) ~[elasticsearch-8.6.1.jar:?]
... 17 more
java.io.IOException: failed to obtain in-memory shard lock
at org.elasticsearch.index.IndexService.createShard(IndexService.java:527) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:851) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:175) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:569) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShard(IndicesClusterStateService.java:508) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createIndicesAndUpdateShards(IndicesClusterStateService.java:463) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:226) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:538) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:524) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:497) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:428) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:154) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:850) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:257) ~[elasticsearch-8.6.1.jar:?]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:223) ~[elasticsearch-8.6.1.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1589) ~[?:?]
Hence, those shards are effectively stuck on the nodes is ES can't obtain the lock as it is being held by [closing shard] operation.
Please advise on what to do in a case like this? We can reproduce it pretty consistently while disabling allocation for ~20 nodes on multiple clusters.
We are running this ES version on bare AWS EC2 nodes:
"version": {
"number": "8.6.1",
"build_flavor": "default",
"build_type": "rpm",
"build_hash": "180c9830da956993e59e2cd70eb32b5e383ea42c",
"build_date": "2023-01-24T21:35:11.506992272Z",
"build_snapshot": false,
"lucene_version": "9.4.2",
"minimum_wire_compatibility_version": "7.17.0",
"minimum_index_compatibility_version": "7.0.0"
}