Cluster re-balancing issue

arunpmohan · December 22, 2021, 1:12pm

We have a cluster with zoning enabled.

Zone-A has
 15 hot nodes+10 warm nodes and
 Zone-B has
 14 hot nodes  + 10 warm nodes (one node was taken out due to repeated issues).

The cluster was doing well until the last week when this Log4j library issue occurred.
Following the remediation method recommended, we started restarting nodes.
Now we restarted zone-A nodes.
But shortly afterwards, cluster started allocating new shards wierdly. Most of the new hot shards are getting allocated to the zone-B nodes. This causes zone-B nodes having many primaries and hence the indexing slowdown happening.
So now we manually re-allocate the primaries from zone-B to zone-A daily.
How to attain the cluster rebalancing to be normal?.
Any help is really appreciated.

leandrojmp · December 22, 2021, 1:28pm

Can you explain a little more about your infrastructure? What is Zone-A and Zone-B? Are they the same? Should one Zone have priority over another?

Do you have an Elasticsearch cluster split in different cloud regions or something like that?

Are you using index allocation filtering and shard allocation awereness?

If you have a cluster split in two different cloud regions, or something similar, and the only attribute used to filtering the shards is the node role, like data_hot and data_warm, then this can happen and is expected.

arunpmohan · December 22, 2021, 5:11pm

Zone-A and Zone-B are two different datacenters.
Node attributes (GET _cat/nodeattrs?v&s=node output) is like this .

Yes we are using allocation awareness
Elasticsearch.yml configuration has the following parameters depending on the zone and type of node (hot/warm)

node.attr.mode: data_node
node.attr.zone: "zone-A"
node.attr.temp: "hot"

The current cluster settings are like this:

{
  "persistent" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "awareness" : {
            "attributes" : "zone",
            "force" : {
              "zone" : {
                "values" : [
                  "zone-A",
                  "zone-B"
                ]
              }
            }
          },
          "disk" : {
            "watermark" : {
              "low" : "1200gb",
              "flood_stage" : "150gb",
              "high" : "1200gb"
            }
          }
        }
      },
      "info" : {
        "update" : {
          "interval" : "60s"
        }
      }
    },
    "indices" : {
      "recovery" : {
        "max_bytes_per_sec" : "400mb"
      }
    }
  },
  "transient" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "node_concurrent_incoming_recoveries" : "1",
          "cluster_concurrent_rebalance" : "2",
          "node_concurrent_recoveries" : "1"
        }
      }
    },
    "indices" : {
      "recovery" : {
        "max_bytes_per_sec" : "40mb"
      }
    }
  }
}

Also, in the daily indices, we have the following settings, which will create new indices only in hot nodes. Later after 90 days we are moving them to warm nodes.

{
  "time-data-2012.12.13" : {
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "require" : {
              "temp" : "hot"
            },
            "total_shards_per_node" : "3"
          }
        },
        "mapping" : {
          "nested_fields" : {
            "limit" : "1000"
          },
          "total_fields" : {
            "limit" : "10000"
          }
        },
        "refresh_interval" : "55s",
        "number_of_shards" : "24",
        "translog" : {
          "sync_interval" : "25s",
          "durability" : "async"
        },
        "provided_name" : "time-data-2012.12.13",
        "merge" : {
          "scheduler" : {
            "max_thread_count" : "3"
          }
        },
        "unassigned" : {
          "node_left" : {
            "delayed_timeout" : "30m"
          }
        },
        "number_of_replicas" : "2"
      }
    }
  }
}

And to give you an idea about the shard allocation happening in nodes, here is the output of GET _cat/allocation?v&s=node API .

Now daily we are manually re-assigning the hot shards from zone-B to zone-A, as zone-B is getting most of the primaries and thus causing load and subsequent indexing problems

How to make the cluster rebalance itself?.

leandrojmp · December 22, 2021, 5:34pm

So, your index has no requirements about the zones, only if the node is hot or warm.

      "index" : {
        "routing" : {
          "allocation" : {
            "require" : {
              "temp" : "hot"
            }
...

This way Elasticsearch can index in both Zones, it will balance between the hot nodes in both zones, according to the number of shards on each node.

Since your hot nodes in Zone-B have less shards, newly created index that require hot nodes will be created in those nodes.

If you have a requirement to index first in the Zone-A datacenter, you should also set this in the index settings.

system · January 19, 2022, 5:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster rebalancing Elasticsearch	2	113	May 8, 2024
Shard Awareness and Allocation Elasticsearch	3	900	July 5, 2017
Elastic shard balancing / allocation Elasticsearch	1	403	June 15, 2023
Is there a way to rebalance data nodes by disk space and not shards? Elasticsearch	5	4321	July 1, 2021
Warm Nodes Rebalance not occurring Elasticsearch	2	597	July 20, 2018

Cluster re-balancing issue

Related topics