New nodes added to cluster are not rebalancing

mpniel · November 18, 2024, 6:32pm

I have 50 nodes in the Elasticsearch 7.5 cluster and added five new nodes. All shards are balancing between nodes . These five new nodes for some reason do not go into balance, and their disks are near empty. There are some shards on them.
‏There is no recover activity in the cluster .
What could be the problem?

Christian_Dahlqvist · November 18, 2024, 6:36pm

What is the full output of the cluster stats API?

Christian_Dahlqvist · November 18, 2024, 7:15pm

I can not access it. Please post it here or upload it as a gist.

mpniel · November 18, 2024, 7:27pm


{

  "_nodes" : {

    "total" : 54,

    "successful" : 54,

    "failed" : 0

  },

  "cluster_name" : "elsgen_cls",

  "cluster_uuid" : "nl-aqpHnSIuUYAplZlAjxQ",

  "timestamp" : 1731955879460,

  "status" : "green",

  "indices" : {

    "count" : 7286,

    "shards" : {

      "total" : 20678,

      "primaries" : 10339,

      "replication" : 1.0,

      "index" : {

        "shards" : {

          "min" : 2,

          "max" : 20,

          "avg" : 2.838045566840516

        },

        "primaries" : {

          "min" : 1,

          "max" : 10,

          "avg" : 1.419022783420258

        },

        "replication" : {

          "min" : 1.0,

          "max" : 1.0,

          "avg" : 1.0

        }

      }

    },

    "docs" : {

      "count" : 55789118611,

      "deleted" : 1074058799

    },

    "store" : {

      "size_in_bytes" : 131433338519174

    },

    "fielddata" : {

      "memory_size_in_bytes" : 32942497780,

      "evictions" : 0

    },

    "query_cache" : {

      "memory_size_in_bytes" : 139011644017,

      "total_count" : 37071265229,

      "hit_count" : 4708036564,

      "miss_count" : 32363228665,

      "cache_size" : 1623403,

      "cache_count" : 142226890,

      "evictions" : 140603487

    },

    "completion" : {

      "size_in_bytes" : 9804

    },

    "segments" : {

      "count" : 414197,

      "memory_in_bytes" : 82211562177,

      "terms_memory_in_bytes" : 23123875929,

      "stored_fields_memory_in_bytes" : 46169186608,

      "term_vectors_memory_in_bytes" : 0,

      "norms_memory_in_bytes" : 741823936,

      "points_memory_in_bytes" : 10210961612,

      "doc_values_memory_in_bytes" : 1965714092,

      "index_writer_memory_in_bytes" : 4537245308,

      "version_map_memory_in_bytes" : 34433767,

      "fixed_bit_set_memory_in_bytes" : 8691405256,

      "max_unsafe_auto_id_timestamp" : 1731954710286,

      "file_sizes" : { }

    }

  },

  "nodes" : {

   "count" : {

      "total" : 54,

      "coordinating_only" : 3,

      "data" : 48,

      "ingest" : 48,

      "master" : 3,

      "ml" : 0,

      "voting_only" : 0

    },

    "versions" : [

      "7.5.1"

    ],

    "os" : {

      "available_processors" : 444,

      "allocated_processors" : 444,

      "names" : [

        {

          "name" : "Linux",

          "count" : 54

        }

      ],

      "pretty_names" : [

        {

          "pretty_name" : "CentOS Linux 7 (Core)",

          "count" : 54

        }

      ],

      "mem" : {

        "total_in_bytes" : 2758616629248,

        "free_in_bytes" : 171232272384,

        "used_in_bytes" : 2587384356864,

        "free_percent" : 6,

        "used_percent" : 94

      }

    },

    "process" : {

      "cpu" : {

        "percent" : 251

      },

      "open_file_descriptors" : {

        "min" : 1575,

        "max" : 20321,

        "avg" : 9382

      }

    },

    "jvm" : {

      "max_uptime_in_millis" : 90227690742,

      "versions" : [

        {

          "version" : "13.0.1",

          "vm_name" : "OpenJDK 64-Bit Server VM",

          "vm_version" : "13.0.1+9",

          "vm_vendor" : "AdoptOpenJDK",

          "bundled_jdk" : true,

          "using_bundled_jdk" : true,

          "count" : 54

        }

      ],

      "mem" : {

        "heap_used_in_bytes" : 816650669464,

        "heap_max_in_bytes" : 1719563845632

      },

      "threads" : 9004

    },

    "fs" : {

      "total_in_bytes" : 210109265416192,

      "free_in_bytes" : 78639043887104,

      "available_in_bytes" : 78639043887104

    },

    "plugins" : [

      {

        "name" : "readonlyrest",

        "version" : "1.27.0",

        "elasticsearch_version" : "7

Christian_Dahlqvist · November 18, 2024, 7:43pm

Is the hardware specification the same for all data nodes in the cluster? How many shards do the new nodes hold? Is data currently being relocated to the new notes?

mpniel · November 18, 2024, 8:56pm

That is exactly the question. Without doing anything the nodes should balance on their own. That's how it was the previous times we added. There are currently only several shards on the new aded nodes.

Christian_Dahlqvist · November 19, 2024, 5:44am

You are running a very old version that has been EOL a very long time so I would recommend you upgrade to the latest version.

Is the hardware specification exactly the same across the board?

Are there any current recoveries in progress? If so, do any of these involve the new nodes?

Is there anything in your master nodes log that indicates issues persisting or propagating the cluster state?

mpniel · November 19, 2024, 8:50am

there are many unassigned shards.

kdwolf · November 21, 2024, 12:04pm

What roles those new nodes have? If they have data role, which one(s): data_content, hot etc. Are those roles exactly the same as on the other nodes?

@Christian_Dahlqvist is it sufficient to have only 3 master / 3 coordinating nodes for a total of 55 (as I understand there were 50 and now added 5 more)?

Christian_Dahlqvist · November 21, 2024, 12:12pm

3 dedicated master nodes is generally recommended for any large size cluster. Only one of them will act as master at any point in time and the others are only there for redundancy. Adding more dedicated therefore just add redundancy and does not affect performance.

It is important though that the master nodes have sufficient resources, which is why I asked to check the logs on the elected master node for any sign of issues.

Coordinating only nodes are not generally required at all but can for some use cases help.

Topic		Replies	Views
Unexpected rebalancing behavior Elasticsearch	4	406	July 6, 2017
Adding the 3rd node on 0.20.5 , (5 primary , 1 replica) did not rebalance based on count Elasticsearch	2	320	July 6, 2017
How to (re)include a (new) node in a cluster without impact performances? Elasticsearch	4	320	October 1, 2021
Data balancing problem Elasticsearch	2	384	July 6, 2017
Shard rebalancing on single-node cluster scaling Elasticsearch	11	1135	July 5, 2017

New nodes added to cluster are not rebalancing

Related topics