Пробовали сделать /_cluster/reroute ?
Да, пробовал, с параметром retry_failed=true
- Могли бы показать вывод
GET /_nodes/stats?fs
?
Вот результат:
{
"_nodes": {
"total": 3,
"successful": 3,
"failed": 0
},
"cluster_name": "elastic-tender",
"nodes": {
"TpMv7UjiRjSgME0lzeBZqQ": {
"timestamp": 1540911905542,
"name": "WIN-2",
"transport_address": "192.168.0.2:9300",
"host": "192.168.0.2",
"ip": "192.168.0.2:9300",
"roles": [
"master",
"data",
"ingest"
],
"fs": {
"timestamp": 1540911905543,
"total": {
"total_in_bytes": 3840657055744,
"free_in_bytes": 2163930603520,
"available_in_bytes": 2163930603520
},
"data": [
{
"path": "C:\\ProgramData\\Elastic\\Elasticsearch\\data\\nodes\\0",
"mount": "Windows (C:)",
"type": "NTFS",
"total_in_bytes": 1920276099072,
"free_in_bytes": 807233945600,
"available_in_bytes": 807233945600
},
{
"path": "D:\\data\\nodes\\0",
"mount": "Новый том (D:)",
"type": "NTFS",
"total_in_bytes": 1920380956672,
"free_in_bytes": 1356696657920,
"available_in_bytes": 1356696657920
}
]
}
},
"-7AEfximRNieGIMlvIH19A": {
"timestamp": 1540911927008,
"name": "WIN-1",
"transport_address": "192.168.0.1:9300",
"host": "192.168.0.1",
"ip": "192.168.0.1:9300",
"roles": [
"master",
"data",
"ingest"
],
"fs": {
"timestamp": 1540911927008,
"total": {
"total_in_bytes": 3840657055744,
"free_in_bytes": 1647241125888,
"available_in_bytes": 1647241125888
},
"data": [
{
"path": "C:\\ProgramData\\Elastic\\Elasticsearch\\data\\nodes\\0",
"mount": "Windows (C:)",
"type": "NTFS",
"total_in_bytes": 1920276099072,
"free_in_bytes": 188253884416,
"available_in_bytes": 188253884416
},
{
"path": "D:\\data\\nodes\\0",
"mount": "Новый том (D:)",
"type": "NTFS",
"total_in_bytes": 1920380956672,
"free_in_bytes": 1458987241472,
"available_in_bytes": 1458987241472
}
]
}
},
"lY6bhl0FSyWiRUjKifeAyw": {
"timestamp": 1540911926272,
"name": "WIN-3",
"transport_address": "192.168.0.3:9300",
"host": "192.168.0.3",
"ip": "192.168.0.3:9300",
"roles": [
"master"
],
"fs": {
"timestamp": 1540911926273,
"total": {
"total_in_bytes": 8000968323072,
"free_in_bytes": 2248689004544,
"available_in_bytes": 2248689004544
},
"data": [
{
"path": "C:\\ProgramData\\Elastic\\Elasticsearch\\data\\nodes\\0",
"mount": "Локальный диск (C:)",
"type": "NTFS",
"total_in_bytes": 8000968323072,
"free_in_bytes": 2248689004544,
"available_in_bytes": 2248689004544
}
]
}
}
}
}
Я удалил одну из реплик в надежде освободить место для перераспределения, однако сейчас похоже на то, что он в холостую гоняет шарды с одной ноды на другую.
На данный момент осталась последняя не распределенная шарда, я хотел поместить ее на вторую ноду, однако результат - ошибка.
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[WIN-1][192.168.0.1:9300][cluster:admin/reroute]"
}
],
"type": "illegal_argument_exception",
"reason": "[allocate_replica] allocation of [tenders_index_2018][0] on node {WIN-2}{TpMv7UjiRjSgME0lzeBZqQ}{L1_KGKd0SiejE46cZFHvUA}{192.168.0.2}{192.168.0.2:9300} is not allowed, reason: [YES(shard has exceeded the maximum number of retries [5] on failed allocation attempts - retrying once due to a manual reroute command, [unassigned_info[[reason=ALLOCATION_FAILED], at[2018-10-30T08:53:24.311Z], failed_attempts[5], delayed=false, details[failed recovery, failure RecoveryFailedException[[tenders_index_2018][0]: Recovery failed from {WIN-1}{-7AEfximRNieGIMlvIH19A}{g84RvI_fQZqAXG94RG3SKg}{192.168.0.1}{192.168.0.1:9300} into {WIN-2}{TpMv7UjiRjSgME0lzeBZqQ}{L1_KGKd0SiejE46cZFHvUA}{192.168.0.2}{192.168.0.2:9300}]; nested: RemoteTransportException[[WIN-1][192.168.0.1:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [216] files with total size of [127.3gb]]; nested: RemoteTransportException[[WIN-2][192.168.0.2:9300][internal:index/shard/recovery/file_chunk]]; nested: IOException[Недостаточно места на диске]; ], allocation_status[no_attempt]]])][YES(primary shard for this replica is already active)][YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)][YES(target node version [5.6.4] is the same or newer than source node version [5.6.4])][YES(the shard is not being snapshotted)][YES(node passes include/exclude/require filters)][NO(the shard cannot be allocated to the same node on which a copy of the shard already exists [[tenders_index_2018][0], node[TpMv7UjiRjSgME0lzeBZqQ], [P], s[STARTED], a[id=15L9dPwyRSaS3BvWOn07Xg]])][YES(enough disk for shard on node, free: [1.2tb], shard size: [0b], free after allocating shard: [1.2tb])][THROTTLE(reached the limit of incoming shard recoveries [3], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=3] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries]))][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it)]"
},
"status": 400
}