The rebalance tasks keeps going and not stop

version :7.10.2
i add a dataNode to cluster,and then the rebalance tasks started...
but a week has passed,rebalance keeps going now and not only for new dataNode。
i found this situation exists for GET _cat/recovery?v API :

source_node target_node
nodeA nodeb
nodeA nodec
noded nodeA
nodee nodeA

i tried reduce the cluster_concurrent_rebalance and configure rebalance.enable as none for a while,but when i open rebalance ,rebalance tasks continues.....

What is the full output of the cluster stats API and the nodes stats API?

can you tell me what you focus on? by the way, another cluster(6.7) add a dataNode but don't have this question。i find there are some changes in class BalancedShardsAllocator.......

Does the other older cluster have similar data volume, shard count and system resources? I am not sure what I am looking for, which is why I asked for the full output of those APIs as it would give me a better idea of the cluster status.

Two cluster have Identical data。by the way,old cluster each node has max(210) and min(209) shards,i think rebalance is works。 but new cluster max(230) and min (190), i think rebalance has doing useless work。

How large portion of the storage are you using? Are you close to any of the watermarks?

disk.used_percent from 40% to 62.4% by _cat/nodes API

here is _cluster/stats response

{
  "_nodes" : {
    "total" : 70,
    "successful" : 70,
    "failed" : 0
  },
  "cluster_name" : "XX",
  "cluster_uuid" : "XXXXXXXXX",
  "timestamp" : 1630046430820,
  "status" : "green",
  "indices" : {
    "count" : 1338,
    "shards" : {
      "total" : 16209,
      "primaries" : 8115,
      "replication" : 0.9974121996303142,
      "index" : {
        "shards" : {
          "min" : 1,
          "max" : 400,
          "avg" : 12.114349775784753
        },
        "primaries" : {
          "min" : 1,
          "max" : 200,
          "avg" : 6.065022421524664
        },
        "replication" : {
          "min" : 0.0,
          "max" : 1.0,
          "avg" : 0.9985052316890882
        }
      }
    },
    "docs" : {
      "count" : 71579201920,
      "deleted" : 1149946221
    },
    "store" : {
      "size_in_bytes" : 214689040782951,
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size_in_bytes" : 7526390264,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 4074640181,
      "total_count" : 6473581019,
      "hit_count" : 19484134,
      "miss_count" : 6454096885,
      "cache_size" : 104588,
      "cache_count" : 111438,
      "evictions" : 6850
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 93539,
      "memory_in_bytes" : 2629684794,
      "terms_memory_in_bytes" : 1500418768,
      "stored_fields_memory_in_bytes" : 76308744,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 42576192,
      "points_memory_in_bytes" : 0,
      "doc_values_memory_in_bytes" : 1010381090,
      "index_writer_memory_in_bytes" : 8997386532,
      "version_map_memory_in_bytes" : 58182369,
      "fixed_bit_set_memory_in_bytes" : 2926083504,
      "max_unsafe_auto_id_timestamp" : 1630030232646,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "alias",
          "count" : 69,
          "index_count" : 23
        },
        {
          "name" : "binary",
          "count" : 15,
          "index_count" : 4
        },
        {
          "name" : "boolean",
          "count" : 8647,
          "index_count" : 1222
        },
        {
          "name" : "byte",
          "count" : 24,
          "index_count" : 24
        },
        {
          "name" : "date",
          "count" : 7771,
          "index_count" : 1330
        },
        {
          "name" : "date_nanos",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "date_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "double",
          "count" : 5732,
          "index_count" : 1160
        },
        {
          "name" : "double_range",
          "count" : 3,
          "index_count" : 2
        },
        {
          "name" : "flattened",
          "count" : 9,
          "index_count" : 1
        },
        {
          "name" : "float",
          "count" : 5979,
          "index_count" : 94
        },
        {
          "name" : "float_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "geo_point",
          "count" : 166,
          "index_count" : 27
        },
        {
          "name" : "geo_shape",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "half_float",
          "count" : 69,
          "index_count" : 16
        },
        {
          "name" : "integer",
          "count" : 191,
          "index_count" : 14
        },
        {
          "name" : "integer_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "ip",
          "count" : 461,
          "index_count" : 24
        },
        {
          "name" : "ip_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "keyword",
          "count" : 104359,
          "index_count" : 1335
        },
        {
          "name" : "long",
          "count" : 79219,
          "index_count" : 1323
        },
        {
          "name" : "long_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "nested",
          "count" : 967,
          "index_count" : 683
        },
        {
          "name" : "object",
          "count" : 56386,
          "index_count" : 422
        },
        {
          "name" : "scaled_float",
          "count" : 2783,
          "index_count" : 23
        },
        {
          "name" : "shape",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "short",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "text",
          "count" : 13000,
          "index_count" : 1320
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [
        {
          "name" : "pattern_capture",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "uax_url_email",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_filters" : [
        {
          "name" : "lowercase",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "unique",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_analyzers" : [
        {
          "name" : "english",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "ik_max_word",
          "count" : 9938,
          "index_count" : 1210
        }
      ]
    }
  },
  "nodes" : {
    "count" : {
      "total" : 70,
      "coordinating_only" : 4,
      "data" : 0,
      "data_cold" : 0,
      "data_content" : 62,
      "data_hot" : 0,
      "data_warm" : 62,
      "ingest" : 1,
      "master" : 3,
      "ml" : 0,
      "remote_cluster_client" : 0,
      "transform" : 0,
      "voting_only" : 0
    },
    "versions" : [
      "7.10.2"
    ],
    "os" : {
      "available_processors" : 5216,
      "allocated_processors" : 5216,
      "names" : [
        {
          "name" : "Linux",
          "count" : 70
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "CentOS Linux 7 (Core)",
          "count" : 70
        }
      ],
      "mem" : {
        "total_in_bytes" : 17807501201408,
        "free_in_bytes" : 1405022416896,
        "used_in_bytes" : 16402478784512,
        "free_percent" : 8,
        "used_percent" : 92
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 74
      },
      "open_file_descriptors" : {
        "min" : 1864,
        "max" : 3845,
        "avg" : 3450
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 61810370,
      "versions" : [
        {
          "version" : "15.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "15.0.1+9",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 70
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 939136819928,
        "heap_max_in_bytes" : 2195751698432
      },
      "threads" : 22183
    },
    "fs" : {
      "total_in_bytes" : 201363506065408,
      "free_in_bytes" : 94269441966080,
      "available_in_bytes" : 94269441966080
    },
    "plugins" : [
      {
        "name" : "analysis-ik",
        "version" : "7.10.2",
        "elasticsearch_version" : "7.10.2",
        "java_version" : "1.8",
        "description" : "IK Analyzer for Elasticsearch",
        "classname" : "org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      }
    ],
    "network_types" : {
      "transport_types" : {
        "security4" : 70
      },
      "http_types" : {
        "security4" : 70
      }
    },
    "discovery_types" : {
      "zen" : 70
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "tar",
        "count" : 70
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 15,
      "processor_stats" : {
        "conditional" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "geoip" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "grok" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "remove" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "rename" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "set" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        }
      }
    }
  }
}

form _cat/recovery API i can see the "type" : "PEER"。
but doc show only STORE SNAPSHOT REPLICA RELOCATING these four type
recovery response

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.