I have 50 nodes in the Elasticsearch 7.5 cluster and added five new nodes. All shards are balancing between nodes . These five new nodes for some reason do not go into balance, and their disks are near empty. There are some shards on them.
โThere is no recover activity in the cluster .
What could be the problem?
What is the full output of the cluster stats API?
I can not access it. Please post it here or upload it as a gist.
{
"_nodes" : {
"total" : 54,
"successful" : 54,
"failed" : 0
},
"cluster_name" : "elsgen_cls",
"cluster_uuid" : "nl-aqpHnSIuUYAplZlAjxQ",
"timestamp" : 1731955879460,
"status" : "green",
"indices" : {
"count" : 7286,
"shards" : {
"total" : 20678,
"primaries" : 10339,
"replication" : 1.0,
"index" : {
"shards" : {
"min" : 2,
"max" : 20,
"avg" : 2.838045566840516
},
"primaries" : {
"min" : 1,
"max" : 10,
"avg" : 1.419022783420258
},
"replication" : {
"min" : 1.0,
"max" : 1.0,
"avg" : 1.0
}
}
},
"docs" : {
"count" : 55789118611,
"deleted" : 1074058799
},
"store" : {
"size_in_bytes" : 131433338519174
},
"fielddata" : {
"memory_size_in_bytes" : 32942497780,
"evictions" : 0
},
"query_cache" : {
"memory_size_in_bytes" : 139011644017,
"total_count" : 37071265229,
"hit_count" : 4708036564,
"miss_count" : 32363228665,
"cache_size" : 1623403,
"cache_count" : 142226890,
"evictions" : 140603487
},
"completion" : {
"size_in_bytes" : 9804
},
"segments" : {
"count" : 414197,
"memory_in_bytes" : 82211562177,
"terms_memory_in_bytes" : 23123875929,
"stored_fields_memory_in_bytes" : 46169186608,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 741823936,
"points_memory_in_bytes" : 10210961612,
"doc_values_memory_in_bytes" : 1965714092,
"index_writer_memory_in_bytes" : 4537245308,
"version_map_memory_in_bytes" : 34433767,
"fixed_bit_set_memory_in_bytes" : 8691405256,
"max_unsafe_auto_id_timestamp" : 1731954710286,
"file_sizes" : { }
}
},
"nodes" : {
"count" : {
"total" : 54,
"coordinating_only" : 3,
"data" : 48,
"ingest" : 48,
"master" : 3,
"ml" : 0,
"voting_only" : 0
},
"versions" : [
"7.5.1"
],
"os" : {
"available_processors" : 444,
"allocated_processors" : 444,
"names" : [
{
"name" : "Linux",
"count" : 54
}
],
"pretty_names" : [
{
"pretty_name" : "CentOS Linux 7 (Core)",
"count" : 54
}
],
"mem" : {
"total_in_bytes" : 2758616629248,
"free_in_bytes" : 171232272384,
"used_in_bytes" : 2587384356864,
"free_percent" : 6,
"used_percent" : 94
}
},
"process" : {
"cpu" : {
"percent" : 251
},
"open_file_descriptors" : {
"min" : 1575,
"max" : 20321,
"avg" : 9382
}
},
"jvm" : {
"max_uptime_in_millis" : 90227690742,
"versions" : [
{
"version" : "13.0.1",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "13.0.1+9",
"vm_vendor" : "AdoptOpenJDK",
"bundled_jdk" : true,
"using_bundled_jdk" : true,
"count" : 54
}
],
"mem" : {
"heap_used_in_bytes" : 816650669464,
"heap_max_in_bytes" : 1719563845632
},
"threads" : 9004
},
"fs" : {
"total_in_bytes" : 210109265416192,
"free_in_bytes" : 78639043887104,
"available_in_bytes" : 78639043887104
},
"plugins" : [
{
"name" : "readonlyrest",
"version" : "1.27.0",
"elasticsearch_version" : "7
Is the hardware specification the same for all data nodes in the cluster? How many shards do the new nodes hold? Is data currently being relocated to the new notes?
That is exactly the question. Without doing anything the nodes should balance on their own. That's how it was the previous times we added. There are currently only several shards on the new aded nodes.
You are running a very old version that has been EOL a very long time so I would recommend you upgrade to the latest version.
Is the hardware specification exactly the same across the board?
Are there any current recoveries in progress? If so, do any of these involve the new nodes?
Is there anything in your master nodes log that indicates issues persisting or propagating the cluster state?
there are many unassigned shards.
What roles those new nodes have? If they have data role, which one(s): data_content, hot etc. Are those roles exactly the same as on the other nodes?
@Christian_Dahlqvist is it sufficient to have only 3 master / 3 coordinating nodes for a total of 55 (as I understand there were 50 and now added 5 more)?
3 dedicated master nodes is generally recommended for any large size cluster. Only one of them will act as master at any point in time and the others are only there for redundancy. Adding more dedicated therefore just add redundancy and does not affect performance.
It is important though that the master nodes have sufficient resources, which is why I asked to check the logs on the elected master node for any sign of issues.
Coordinating only nodes are not generally required at all but can for some use cases help.