The rebalancing task lasted for more than a week without stopping, and the balance has not been reached

version :7.10.2
My cluster has a total of 30 machines, one machine has two data nodes, so there are a total of 60 nodes
i add a dataNode to cluster,and then the rebalance tasks started...
but a week has passed,rebalance keeps going now and not only for new dataNode。
Will configuring two nodes on one machine cause some bugs in the cluster?Or other issues。
And i found this situation exists for GET _cat/recovery?v API :

source_node target_node
nodeA nodeb
nodeA nodec
noded nodeA
nodee nodeA

i tried reduce the cluster_concurrent_rebalance and configure rebalance.enable as none for a while,but when i open rebalance ,rebalance tasks continues.....
Hope someone can help me, thank you very much!

Is there any official person to help me? :sob: I think this is like a bug, but I haven’t found a specific version that will fix it.

What's the output from _cat/allocattion?v?
Do you have any allocation awareness settings applied?
What do your master node logs show?



Please don't post pictures of text or code. They are difficult to read, impossible to search and replicate (if it's code), and some people may not be even able to see them :slight_smile:

The following is the master log:

[2021-09-14T01:51:00,001][INFO ][o.e.x.m.MlDailyMaintenanceService] [master-xxx.xxx.xxx.xxx-9210] triggering scheduled [ML] maintenance tasks
[2021-09-14T01:51:00,023][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [master-xxx.xxx.xxx.xxx-9210] Deleting expired data
[2021-09-14T01:51:00,127][INFO ][o.e.x.m.j.r.UnusedStatsRemover] [master-xxx.xxx.xxx.xxx-9210] Successfully deleted [0] unused stats documents
[2021-09-14T01:51:00,128][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [master-xxx.xxx.xxx.xxx-9210] Completed deletion of expired ML data
[2021-09-14T01:51:00,128][INFO ][o.e.x.m.MlDailyMaintenanceService] [master-xxx.xxx.xxx.xxx-9210] Successfully completed [ML] maintenance task: triggerDeleteExpiredDataTask
[2021-09-14T06:05:01,138][INFO ][o.e.x.i.IndexLifecycleTransition] [master-xxx.xxx.xxx.xxx-9210] moving index [metricbeat-7.10.2-2021.09.12-000040] from [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}] to [{"phase":"hot","action":"rollover","name":"attempt-rollover"}] in policy [metricbeat]
[2021-09-14T06:05:01,443][INFO ][o.e.c.m.MetadataCreateIndexService] [master-xxx.xxx.xxx.xxx-9210] [metricbeat-7.10.2-2021.09.13-000041] creating index, cause [rollover_index], templates [metricbeat-7.10.2], shards [1]/[1]
[2021-09-14T06:05:03,538][INFO ][o.e.x.i.IndexLifecycleTransition] [master-xxx.xxx.xxx.xxx-9210] moving index [metricbeat-7.10.2-2021.09.13-000041] from [null] to [{"phase":"new","action":"complete","name":"complete"}] in policy [metricbeat]
[2021-09-14T06:05:03,757][INFO ][o.e.c.m.MetadataMappingService] [master-xxx.xxx.xxx.xxx-9210] [metricbeat-7.10.2-2021.09.13-000041/ph9UaWOuSv6dr1LqaLbQlA] update_mapping [_doc]
[2021-09-14T06:05:05,053][INFO ][o.e.x.i.IndexLifecycleTransition] [master-xxx.xxx.xxx.xxx-9210] moving index [metricbeat-7.10.2-2021.09.12-000040] from [{"phase":"hot","action":"rollover","name":"attempt-rollover"}] to [{"phase":"hot","action":"rollover","name":"wait-for-active-shards"}] in policy [metricbeat]
[2021-09-14T06:05:05,271][INFO ][o.e.c.m.MetadataMappingService] [master-xxx.xxx.xxx.xxx-9210] [metricbeat-7.10.2-2021.09.13-000041/ph9UaWOuSv6dr1LqaLbQlA] update_mapping [_doc]
[2021-09-14T06:05:06,831][INFO ][o.e.c.m.MetadataMappingService] [master-xxx.xxx.xxx.xxx-9210] [metricbeat-7.10.2-2021.09.13-000041/ph9UaWOuSv6dr1LqaLbQlA] update_mapping [_doc]
[2021-09-14T06:05:07,195][INFO ][o.e.x.i.IndexLifecycleTransition] [master-xxx.xxx.xxx.xxx-9210] moving index [metricbeat-7.10.2-2021.09.13-000041] from [{"phase":"new","action":"complete","name":"complete"}] to [{"phase":"hot","action":"unfollow","name":"wait-for-indexing-complete"}] in policy [metricbeat]
[2021-09-14T06:05:07,326][INFO ][o.e.x.i.IndexLifecycleTransition] [master-xxx.xxx.xxx.xxx-9210] moving index [metricbeat-7.10.2-2021.09.12-000040] from [{"phase":"hot","action":"rollover","name":"wait-for-active-shards"}] to [{"phase":"hot","action":"rollover","name":"update-rollover-lifecycle-date"}] in policy [metricbeat]
[2021-09-14T06:05:07,344][INFO ][o.e.x.i.IndexLifecycleTransition] [master-xxx.xxx.xxx.xxx-9210] moving index [metricbeat-7.10.2-2021.09.12-000040] from [{"phase":"hot","action":"rollover","name":"update-rollover-lifecycle-date"}] to [{"phase":"hot","action":"rollover","name":"set-indexing-complete"}] in policy [metricbeat]
[2021-09-14T06:05:08,339][INFO ][o.e.x.i.IndexLifecycleTransition] [master-xxx.xxx.xxx.xxx-9210] moving index [metricbeat-7.10.2-2021.09.13-000041] from [{"phase":"hot","action":"unfollow","name":"wait-for-indexing-complete"}] to [{"phase":"hot","action":"unfollow","name":"wait-for-follow-shard-tasks"}] in policy [metricbeat]
[2021-09-14T06:05:08,456][INFO ][o.e.x.i.IndexLifecycleTransition] [master-xxx.xxx.xxx.xxx-9210] moving index [metricbeat-7.10.2-2021.09.12-000040] from [{"phase":"hot","action":"rollover","name":"set-indexing-complete"}] to [{"phase":"hot","action":"complete","name":"complete"}] in policy [metricbeat]
[2021-09-14T06:09:21,953][INFO ][o.e.c.r.a.AllocationService] [master-xxx.xxx.xxx.xxx-9210] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[metricbeat-7.10.2-2021.09.13-000041][0]]]).

Regarding the configuration of allocation, I only modified the following three:

cluster.routing.allocation.balance.index :0.45 (default:0.55)
cluster.routing.allocation.balance.shard:0.55 (default:0.45)
cluster.routing.allocation.cluster_concurrent_rebalance:40  (default:2)

There's nothing in your logs that shows reallocation other than what would be expected from an ILM policy, so it's hard to see what's happening.

The output from _cat/allocattion?vare as follows:(I have hidden the content of ip and host)

shards disk.indices disk.used disk.avail disk.total disk.percent host          ip            node
   273        3.2tb     3.3tb      2.5tb      5.8tb           56 
   273        3.1tb     3.1tb      2.6tb      5.8tb           54 
   262        2.9tb     2.9tb      2.8tb      5.8tb           50 
   273        3.3tb     3.3tb      2.4tb      5.8tb           57 
   273        3.3tb     3.3tb      2.4tb      5.8tb           57 
   273          3tb     3.1tb      2.6tb      5.8tb           53
   274        3.2tb     3.2tb      2.5tb      5.8tb           55 
   274        3.3tb     3.3tb      2.4tb      5.8tb           57 
   273        3.1tb     3.1tb      2.7tb      5.8tb           53 
   214        1.7tb     1.7tb        4tb      5.8tb           29 
   274          3tb       3tb      2.7tb      5.8tb           52 
   273        3.2tb     3.2tb      2.5tb      5.8tb           56
   273          3tb       3tb      2.7tb      5.8tb           52 
   272        3.1tb     3.1tb      2.6tb      5.8tb           54 
   271        3.3tb     3.3tb      2.4tb      5.8tb           57 
   273        3.3tb     3.3tb      2.4tb      5.8tb           58 
   274        3.4tb     3.4tb      2.3tb      5.8tb           58
   273        3.3tb     3.3tb      2.4tb      5.8tb           57 
   273        3.4tb     3.4tb      2.3tb      5.8tb           59
   272        3.2tb     3.2tb      2.5tb      5.8tb           56
   272        3.2tb     3.2tb      2.5tb      5.8tb           56 
   273        3.3tb     3.3tb      2.4tb      5.8tb           57 
   274        3.2tb     3.2tb      2.5tb      5.8tb           56 
   273        3.3tb     3.3tb      2.4tb      5.8tb           57 
   272        3.1tb     3.2tb      2.6tb      5.8tb           55 
   273        3.2tb     3.2tb      2.5tb      5.8tb           56 
   273        3.2tb     3.2tb      2.5tb      5.8tb           55
   274        3.1tb     3.1tb      2.6tb      5.8tb           54 
   273        3.1tb     3.1tb      2.6tb      5.8tb           54
   264        3.4tb     3.4tb      2.3tb      5.8tb           59
   272        3.4tb     3.4tb      2.4tb      5.8tb           58 
   270          3tb       3tb      2.7tb      5.8tb           52 
   273        3.1tb     3.1tb      2.6tb      5.8tb           54
   273        3.2tb     3.3tb      2.4tb      5.8tb           57
   273        3.2tb     3.2tb      2.5tb      5.8tb           55
   273        3.3tb     3.3tb      2.4tb      5.8tb           57 
   273        3.1tb     3.1tb      2.6tb      5.8tb           54
   273        3.2tb     3.2tb      2.5tb      5.8tb           55 
   273        3.2tb     3.2tb      2.5tb      5.8tb           56
   272        3.3tb     3.3tb      2.4tb      5.8tb           57 
   253        2.6tb     2.7tb        3tb      5.8tb           46
   273        3.1tb     3.1tb      2.6tb      5.8tb           54 
   213        2.2tb     2.2tb      3.6tb      5.8tb           38
   275        3.1tb     3.1tb      2.7tb      5.8tb           53 
   274        3.2tb     3.2tb      2.5tb      5.8tb           56 
   273        3.4tb     3.4tb      2.3tb      5.8tb           59 
   273        3.3tb     3.3tb      2.4tb      5.8tb           57 
   273        3.3tb     3.3tb      2.4tb      5.8tb           57
   275        3.3tb     3.3tb      2.4tb      5.8tb           57
   273        3.3tb     3.3tb      2.5tb      5.8tb           56 
   271        2.8tb     2.8tb      2.9tb      5.8tb           49 
   273        3.3tb     3.3tb      2.4tb      5.8tb           58
   273        3.2tb     3.2tb      2.5tb      5.8tb           56
   257        3.2tb     3.2tb      2.5tb      5.8tb           55
   273        3.2tb     3.2tb      2.5tb      5.8tb           55 
   265        3.3tb     3.3tb      2.4tb      5.8tb           58
   274        3.3tb     3.3tb      2.4tb      5.8tb           57
   274          3tb       3tb      2.7tb      5.8tb           53 
   273        3.2tb     3.2tb      2.5tb      5.8tb           56 
   265        2.9tb     2.9tb      2.8tb      5.8tb           50 
   273        3.2tb     3.2tb      2.5tb      5.8tb           55 
   273        3.3tb     3.3tb      2.5tb      5.8tb           56

What is the full output of the cluster stats API?

This is indeed all the logs, and the newly generated logs are basically similar to the following:
[2021-09-14T11:25:05,106][INFO ][o.e.x.i.IndexLifecycleTransition] [master-xxx.xxx.xxx.xxx-9210] moving index [.kibana-event-log-7.10.2-000007] from [{"phase":"hot","action":"unfollow","name":"wait-for-yellow-step"}] to [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}] in policy [kibana-event-log-policy]

That message indicates that the index is moved to a different state, not a different node. You can monitor shard movements using the cat recovery API. What does this show?

Here is the full output of the cluster stats:

{
    "_nodes": {
        "total": 70,
        "successful": 70,
        "failed": 0
    },
    "cluster_name": "xxxxx",
    "cluster_uuid": "xxxxx",
    "timestamp": 1631595738148,
    "status": "green",
    "indices": {
        "count": 1391,
        "shards": {
            "total": 16735,
            "primaries": 8379,
            "replication": 0.9972550423678244,
            "index": {
                "shards": {
                    "min": 1,
                    "max": 400,
                    "avg": 12.030913012221424
                },
                "primaries": {
                    "min": 1,
                    "max": 200,
                    "avg": 6.02372393961179
                },
                "replication": {
                    "min": 0,
                    "max": 1,
                    "avg": 0.9971243709561467
                }
            }
        },
        "docs": {
            "count": 73134894750,
            "deleted": 592805409
        },
        "store": {
            "size_in_bytes": 217492731193541,
            "reserved_in_bytes": 0
        },
        "fielddata": {
            "memory_size_in_bytes": 4419829176,
            "evictions": 0
        },
        "query_cache": {
            "memory_size_in_bytes": 15333618230,
            "total_count": 66970368294,
            "hit_count": 427855687,
            "miss_count": 66542512607,
            "cache_size": 153131,
            "cache_count": 1190658,
            "evictions": 1037527
        },
        "completion": {
            "size_in_bytes": 0
        },
        "segments": {
            "count": 83237,
            "memory_in_bytes": 2224512332,
            "terms_memory_in_bytes": 1287050168,
            "stored_fields_memory_in_bytes": 71620168,
            "term_vectors_memory_in_bytes": 488,
            "norms_memory_in_bytes": 35990080,
            "points_memory_in_bytes": 0,
            "doc_values_memory_in_bytes": 829851428,
            "index_writer_memory_in_bytes": 5768524576,
            "version_map_memory_in_bytes": 10630400,
            "fixed_bit_set_memory_in_bytes": 7579813112,
            "max_unsafe_auto_id_timestamp": 1631589312224,
            "file_sizes": {}
        },
        "mappings": {
            "field_types": [
                {
                    "name": "alias",
                    "count": 123,
                    "index_count": 41
                },
                {
                    "name": "binary",
                    "count": 15,
                    "index_count": 4
                },
                {
                    "name": "boolean",
                    "count": 9888,
                    "index_count": 1272
                },
                {
                    "name": "byte",
                    "count": 42,
                    "index_count": 42
                },
                {
                    "name": "date",
                    "count": 8904,
                    "index_count": 1382
                },
                {
                    "name": "date_nanos",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "date_range",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "dense_vector",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "double",
                    "count": 7678,
                    "index_count": 1204
                },
                {
                    "name": "double_range",
                    "count": 3,
                    "index_count": 2
                },
                {
                    "name": "flattened",
                    "count": 9,
                    "index_count": 1
                },
                {
                    "name": "float",
                    "count": 8519,
                    "index_count": 117
                },
                {
                    "name": "float_range",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "geo_point",
                    "count": 292,
                    "index_count": 45
                },
                {
                    "name": "geo_shape",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "half_float",
                    "count": 69,
                    "index_count": 16
                },
                {
                    "name": "integer",
                    "count": 191,
                    "index_count": 14
                },
                {
                    "name": "integer_range",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "ip",
                    "count": 821,
                    "index_count": 42
                },
                {
                    "name": "ip_range",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "keyword",
                    "count": 122970,
                    "index_count": 1387
                },
                {
                    "name": "long",
                    "count": 121073,
                    "index_count": 1373
                },
                {
                    "name": "long_range",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "nested",
                    "count": 998,
                    "index_count": 697
                },
                {
                    "name": "object",
                    "count": 99352,
                    "index_count": 450
                },
                {
                    "name": "scaled_float",
                    "count": 4961,
                    "index_count": 41
                },
                {
                    "name": "shape",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "short",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "text",
                    "count": 14365,
                    "index_count": 1371
                }
            ]
        },
        "analysis": {
            "char_filter_types": [],
            "tokenizer_types": [],
            "filter_types": [
                {
                    "name": "pattern_capture",
                    "count": 1,
                    "index_count": 1
                }
            ],
            "analyzer_types": [
                {
                    "name": "custom",
                    "count": 1,
                    "index_count": 1
                }
            ],
            "built_in_char_filters": [],
            "built_in_tokenizers": [
                {
                    "name": "uax_url_email",
                    "count": 1,
                    "index_count": 1
                }
            ],
            "built_in_filters": [
                {
                    "name": "lowercase",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "unique",
                    "count": 1,
                    "index_count": 1
                }
            ],
            "built_in_analyzers": [
                {
                    "name": "english",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "ik_max_word",
                    "count": 10162,
                    "index_count": 1238
                }
            ]
        }
    },
    "nodes": {
        "count": {
            "total": 70,
            "coordinating_only": 4,
            "data": 0,
            "data_cold": 0,
            "data_content": 62,
            "data_hot": 0,
            "data_warm": 62,
            "ingest": 1,
            "master": 3,
            "ml": 0,
            "remote_cluster_client": 0,
            "transform": 0,
            "voting_only": 0
        },
        "versions": [
            "7.10.2"
        ],
        "os": {
            "available_processors": 5216,
            "allocated_processors": 5216,
            "names": [
                {
                    "name": "Linux",
                    "count": 70
                }
            ],
            "pretty_names": [
                {
                    "pretty_name": "CentOS Linux 7 (Core)",
                    "count": 70
                }
            ],
            "mem": {
                "total_in_bytes": 17807501201408,
                "free_in_bytes": 897589530624,
                "used_in_bytes": 16909911670784,
                "free_percent": 5,
                "used_percent": 95
            }
        },
        "process": {
            "cpu": {
                "percent": 221
            },
            "open_file_descriptors": {
                "min": 1887,
                "max": 3758,
                "avg": 3469
            }
        },
        "jvm": {
            "max_uptime_in_millis": 1611117676,
            "versions": [
                {
                    "version": "15.0.1",
                    "vm_name": "OpenJDK 64-Bit Server VM",
                    "vm_version": "15.0.1+9",
                    "vm_vendor": "AdoptOpenJDK",
                    "bundled_jdk": true,
                    "using_bundled_jdk": true,
                    "count": 70
                }
            ],
            "mem": {
                "heap_used_in_bytes": 1028837157904,
                "heap_max_in_bytes": 2195751698432
            },
            "threads": 24368
        },
        "fs": {
            "total_in_bytes": 201363506065408,
            "free_in_bytes": 91212150325248,
            "available_in_bytes": 91212150325248
        },
        "plugins": [
            {
                "name": "analysis-ik",
                "version": "7.10.2",
                "elasticsearch_version": "7.10.2",
                "java_version": "1.8",
                "description": "IK Analyzer for Elasticsearch",
                "classname": "org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin",
                "extended_plugins": [],
                "has_native_controller": false
            }
        ],
        "network_types": {
            "transport_types": {
                "security4": 70
            },
            "http_types": {
                "security4": 70
            }
        },
        "discovery_types": {
            "zen": 70
        },
        "packaging_types": [
            {
                "flavor": "default",
                "type": "tar",
                "count": 70
            }
        ],
        "ingest": {
            "number_of_pipelines": 18,
            "processor_stats": {
                "conditional": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time_in_millis": 0
                },
                "geoip": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time_in_millis": 0
                },
                "grok": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time_in_millis": 0
                },
                "gsub": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time_in_millis": 0
                },
                "remove": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time_in_millis": 0
                },
                "rename": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time_in_millis": 0
                },
                "script": {
                    "count": 14,
                    "failed": 4,
                    "current": 0,
                    "time_in_millis": 21
                },
                "set": {
                    "count": 10,
                    "failed": 0,
                    "current": 0,
                    "time_in_millis": 18
                }
            }
        }
    }
}

This is also troubled by me, so I can't find the reason :sob:

Here: :smile:

sl_weixin-comment_202011_v1         0     1.3m  peer           index    ip21 data-ip21-9201 ip15 data-ip15-9201 n/a        n/a      27    26              96.3%         27          4584744775   3243567409      70.7%         4584744775   1            0                      0.0%
sl_weixin_201812_v1                 7     4.8m  peer           index    ip27 data-ip27-9202 ip21 data-ip21-9202 n/a        n/a      18    16              88.9%         18          28098184802  12458809480     44.3%         28098184802  0            0                      100.0%
sl_video_202105_v1                  0     2.5m  peer           index    ip24 data-ip24-9201 ip13 data-ip13-9201 n/a        n/a      45    43              95.6%         45          22960400828  12584962168     54.8%         22960400828  0            0                      100.0%
sl_video_202105_v1                  3     3m    peer           index    ip4 data-ip4-9201 ip31  data-ip31-9202  n/a        n/a      51    48              94.1%         51          22967151236  5357819127      23.3%         22967151236  0            0                      100.0%
sl_video_202105_v1                  6     13.1m peer           index    ip8 data-ip8-9202 ip31  data-ip31-9202  n/a        n/a      42    41              97.6%         42          22964004446  21265061227     92.6%         22964004446  0            0                      100.0%
sl_video_202105_v1                  8     18.4s peer           index    ip27 data-ip27-9202 ip16 data-ip16-9201 n/a        n/a      54    49              90.7%         54          22963282082  685035437       3.0%          22963282082  0            0                      100.0%
sl_video_202105_v1                  9     4.4m  peer           index    ip22 data-ip22-9202 ip30 data-ip30-9201 n/a        n/a      51    50              98.0%         51          22959658140  18463867294     80.4%         22959658140  0            0                      100.0%
sl_video_202105_v1                  10    4.9m  peer           index    ip12 data-ip12-9202 ip5 data-ip5-9202 n/a        n/a      36    35              97.2%         36          22957972007  15950452035     69.5%         22957972007  0            0                      100.0%
sl_video_202105_v1                  11    4.4m  peer           index    ip2 data-ip2-9201 ip13 data-ip13-9202 n/a        n/a      30    29              96.7%         30          22964789531  15874673342     69.1%         22964789531  0            0                      100.0%
sl_video_202105_v1                  13    1.1m  peer           index    ip3 data-ip3-9202 ip15 data-ip15-9201 n/a        n/a      36    32              88.9%         36          22970007002  2324170178      10.1%         22970007002  0            0                      100.0%
sl_video_202105_v1                  19    2.6m  peer           index    ip3 data-ip3-9201 ip9 data-ip9-9201 n/a        n/a      51    49              96.1%         51          22960550766  8407490986      36.6%         22960550766  0            0                      100.0%
sl_news-app_202002_v1               5     9.9m  peer           index    ip15 data-ip15-9201 ip23 data-ip23-9201 n/a        n/a      18    17              94.4%         18          25792120984  21180013805     82.1%         25792120984  0            0                      100.0%
sl_news_202108_v1                   8     2.9m  peer           index    ip11 data-ip11-9201 ip3 data-ip3-9201 n/a        n/a      30    28              93.3%         30          25186986014  10513747439     41.7%         25186986014  397          0                      0.0%
sl_news_202108_v1                   16    2.1m  peer           index    ip4 data-ip4-9201 ip31  data-ip31-9201  n/a        n/a      48    45              93.8%         48          25160602090  3617682820      14.4%         25160602090  902          0                      0.0%
user_douyin_v1                      1     56.6s peer           index    ip23 data-ip23-9201 ip25 data-ip25-9201 n/a        n/a      209   200             95.7%         209         25587485452  2727847757      10.7%         25587485452  336570       0                      0.0%
sl_weibo_201906_v1                  8     34.3s peer           index    ip11 data-ip11-9202 ip9 data-ip9-9201 n/a        n/a      51    47              92.2%         51          23465130975  2414661724      10.3%         23465130975  255          0                      0.0%
sl_weibo_201906_v1                  16    6.8m  peer           index    ip14 data-ip14-9202 ip16 data-ip16-9202 n/a        n/a      39    38              97.4%         39          23478757865  19985333192     85.1%         23478757865  1294         0                      0.0%
sl_video_201906_v1                  0     1.9m  peer           index    ip6 data-ip6-9201 ip5 data-ip5-9202 n/a        n/a      18    15              83.3%         18          15343534060  4666299323      30.4%         15343534060  0            0                      100.0%
sl_weibo_202108_v1                  17    5.5m  peer           index    ip16 data-ip16-9202 ip22 data-ip22-9202 n/a        n/a      183   182             99.5%         183         22703078331  15183121998     66.9%         22703078331  1397         0                      0.0%
sl_weibo_202108_v1                  21    7.5m  peer           index    ip16 data-ip16-9201 ip23 data-ip23-9201 n/a        n/a      174   173             99.4%         174         23268438750  15040598115     64.6%         23268438750  1829         0                      0.0%
sl_weibo_202108_v1                  23    5.7m  peer           index    ip14 data-ip14-9202 ip5 data-ip5-9201 n/a        n/a      177   176             99.4%         177         22618664665  17908983247     79.2%         22618664665  38323        0                      0.0%
sl_weibo_202108_v1                  41    6.6m  peer           index    ip2 data-ip2-9201 ip18 data-ip18-9201 n/a        n/a      126   125             99.2%         126         22725646516  17332080221     76.3%         22725646516  2040         0                      0.0%
sl_weibo_202108_v1                  45    4m    peer           index    ip3 data-ip3-9201 ip31  data-ip31-9201  n/a        n/a      180   176             97.8%         180         22652925301  4435401149      19.6%         22652925301  15416        0                      0.0%
sl_ec_201908_v1                     3     9.2m  peer           index    ip16 data-ip16-9202 ip3 data-ip3-9202 n/a        n/a      39    38              97.4%         39          20680485901  20458800078     98.9%         20680485901  0            0                      100.0%
sl_ec_201908_v1                     4     1.5m  peer           index    ip13 data-ip13-9201 ip7 data-ip7-9202 n/a        n/a      33    30              90.9%         33          20679950361  5864422857      28.4%         20679950361  0            0                      100.0%
sl_weixin_201908_v1                 9     7.4m  peer           index    ip28 data-ip28-9202 ip21 data-ip21-9202 n/a        n/a      18    17              94.4%         18          25511425056  24435084882     95.8%         25511425056  0            0                      100.0%
sl_ec_202104_v1                     1     2.8m  peer           index    ip5 data-ip5-9202 ip8 data-ip8-9201 n/a        n/a      60    57              95.0%         60          23983263781  7085794663      29.5%         23983263781  8            0                      0.0%
sl_video_201912_v1                  0     2.8m  peer           index    ip8 data-ip8-9201 ip28 data-ip28-9201 n/a        n/a      39    36              92.3%         39          34673379128  10229684234     29.5%         34673379128  0            0                      100.0%
sl_news_202007_v1                   8     3.9m  peer           index    ip23 data-ip23-9202 ip30 data-ip30-9202 n/a        n/a      18    16              88.9%         18          26953828769  11607792126     43.1%         26953828769  0            0                      100.0%
sl_weibo_202102_v1                  17    1.4m  peer           index    ip30 data-ip30-9201 ip23 data-ip23-9202 n/a        n/a      69    65              94.2%         69          25007976295  4883245308      19.5%         25007976295  5122         0                      0.0%
sl_news_201910_v1                   2     3.3m  peer           index    ip1 data-ip1-9201 ip26 data-ip26-9202 n/a        n/a      18    17              94.4%         18          21022294710  13656217903     65.0%         21022294710  0            0                      100.0%
sl_weibo-comment_201812_v1          0     1.8m  peer           index    ip20 data-ip20-9201 ip14 data-ip14-9202 n/a        n/a      18    15              83.3%         18          23291417512  4988002368      21.4%         23291417512  0            0                      100.0%
kol_user_v1                         0     24.4s peer           index    ip5 data-ip5-9202 ip17 data-ip17-9202 n/a        n/a      125   115             92.0%         125         3546976610   742052567       20.9%         3546976610   0            0                      100.0%
sl_news-app_201908_v1               0     7.3m  peer           index    ip30 data-ip30-9202 ip25 data-ip25-9201 n/a        n/a      18    17              94.4%         18          25462553741  18340864940     72.0%         25462553741  0            0                      100.0%
sl_news_202012_v1                   19    4.6m  peer           index    ip29 data-ip29-9202 ip18 data-ip18-9201 n/a        n/a      18    16              88.9%         18          24987389909  13312075006     53.3%         24987389909  0            0                      100.0%
sl_weibo_202107_v1                  48    1.8m  peer           index    ip15 data-ip15-9202 ip2 data-ip2-9202 n/a        n/a      81    79              97.5%         81          27962643024  11847502839     42.4%         27962643024  4804         0                      0.0%
sl_ec_201909_v1                     5     3.8m  peer           index    ip18 data-ip18-9201 ip26 data-ip26-9202 n/a        n/a      66    64              97.0%         66          21500142782  10109975061     47.0%         21500142782  0            0                      100.0%
sl_news-app_202004_v1               13    11.1m peer           index    ip16 data-ip16-9201 ip28 data-ip28-9201 n/a        n/a      18    17              94.4%         18          25903099528  22495888780     86.8%         25903099528  0            0                      100.0%

This is almost certainly a bad idea. This setting controls how far ahead the balancer looks, but the lookahead isn't perfect so if you set it very high (i.e. 40) it's probably overshooting and then having to back-track. Set it back to 2.

2 Likes

Thank you very much.
I will set it to 2 to see if it works

I restarted the current master node of the cluster yesterday
Now that the cluster is balanced
Feels very inexplicable
Thank you very much for answering my questions and helping me