Elastic Cluster Balancing

ELK stack 8.11,

How do i get my cluster to balance by available disk space, 1 node keeps hitting the watermark while the other 3 nodes have 2TB available

Here are the Cluster settings

{
  "persistent": {
    "cluster": {
      "routing": {
        "rebalance": {
          "enable": "all"
        },
        "allocation": {
          "allow_rebalance": "indices_all_active",
          "cluster_concurrent_rebalance": "2",
          "node_concurrent_recoveries": "2",
          "disk": {
            "threshold_enabled": "true",
            "watermark": {
              "low": "200gb",
              "flood_stage": "10gb",
              "high": "100gb"
            }
          },
          "balance": {
            "index": "0.55f",
            "shard": "0.45f"
          }
        }
      }
    },
This table contains 6 rows out of 6 rows; Page 1 of 1.
Name Alerts Status Roles Shards CPU Usage Load Average JVM Heap Disk Free Space
Coordinating Clear Online N/A 0 21% 0.79 50% 180.3 GB
Node1 Clear Online N/A 328 24% 1.53 54% 1.9 TB
Node2 Clear Online N/A 353 22% 1.94 35% 2.8 TB
Node3 Clear Online N/A 340 23% 1.74 59% 1.9 TB
Node4 Clear Online N/A 292 19% 1.67 29% 1.2 TB
type or paste code here

Elasticsearch will try to balance de shards by the number of shards, it will take in consideration the watermark levels and the shard size, but it is not possible to balance based on the disk free space.

What is your average shard size? Do you have many small shards?

Also, which node is hitting the watermark? All your nodes have more than 1 TB of free space and your low watermark is set to 200 GB.

Can you provide a little more context?

the average shard size is 50GB, yes we have small shards also ., node 4 is hitting the water mark, i recently raised watermark from 85% ( which is 1TB ) to 200GB to give me breathing room, we also added more disk space, but as you can see node 4 is has less space than the rest , and at this trend it will do what it did before all the modifications , node 4 will hit watermark as it has the least amount of shards on it so elastic will naturaully put more shards on node 4 to even out the shard count

Hi there,

We have a 12 node cluster and in order to spread the data as evenly as possible across all nodes, we use ILM to give us an optimum shard size and each index has 12 primary shards. This means that each data node has an equal amount of data.

In the past we have had indexes with less primaries. This meant that some of the data nodes naturally had more data on them, because certain indexes had less primaries. Some nodes would hit the 85% threshold whilst others wouldn't. When we moved to ILM, all indexes were set to have 12 primaries. Now all nodes use almost identical amount of disk space.

You might want to run a GET _cluster/allocation/explain. This may not return anything but it could.

It could be that the data is perfectly balanced in the eyes of Elasticsearch, especially if your cluster is green.

I hope that helps.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.