Redistribution of Shards

Hello everyone again,

I have a cluster of 3 servers for elasticsearch and today I realize that one of them failed during the weekend because I got a message related to watermark.

I SSHed into the failed server and indeed elasticsearch was stopped, I started the service again and it started correctly.

The problem I have is that I have the other 2 servers at 93%, I ask for your help asking for your suggestions on how I can proceed to balance again the distribution of disk consumption, I have no way to increase the capacity of the disks.

I suppose that for now the first thing I must do is that I finish the redistribution of Shards before:

  1. delete index of previous months to free up space
  2. Modify the retention time policy from 60 days to 3 days.
  3. I don't know if it is convenient to use the reroute Cluster reroute API | Elasticsearch Guide [8.17] | Elastic or if it is a bad idea, I would appreciate your suggestions.

Thanks

The rebalance should occur automatically, it just takes time.

Check the Shard Activity on the Overview tab in the same monitoring screen, it should show the shards moving.

1 Like

@leandrojmp

Thank you very much for your response.

This is what appears to me, in addition I attach the detail of the messages that appear on the shards that are missing to locate.

Can I delete an old index in order to free up space as long as it is in “green” status?

GET /_cluster/allocation/explain

{
  "note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API. See https://www.elastic.co/guide/en/elasticsearch/reference/8.17/cluster-allocation-explain.html for more information.",
  "index": ".monitoring-es-7-2025.04.07",
  "shard": 0,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "REPLICA_ADDED",
    "at": "2025-04-07T00:00:03.833Z",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "no",
  "allocate_explanation": "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
  "node_allocation_decisions": [
    {
      "node_id": "4QDdYPTZS7alKVtZYEqcyA",
      "node_name": "elastic-3",
      "transport_address": "172.26.6.38:9300",
      "node_attributes": {
        "transform.config_version": "10.0.0",
        "xpack.installed": "true",
        "ml.config_version": "12.0.0",
        "ml.max_jvm_size": "16827547648",
        "ml.allocated_processors_double": "4.0",
        "ml.allocated_processors": "4",
        "ml.machine_memory": "33651851264"
      },
      "roles": [
        "data",
        "data_cold",
        "data_content",
        "data_frozen",
        "data_hot",
        "data_warm",
        "ingest",
        "master",
        "ml",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "a copy of this shard is already allocated to this node [[.monitoring-es-7-2025.04.07][0], node[4QDdYPTZS7alKVtZYEqcyA], [P], s[STARTED], a[id=9j8KyZUWT1O1ikLXgVYScA], failed_attempts[0]]"
        },
        {
          "decider": "throttling",
          "decision": "THROTTLE",
          "explanation": "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
        }
      ]
    },
    {
      "node_id": "St_nPtbcQnWvb4fZJvY7HA",
      "node_name": "elastic-1",
      "transport_address": "172.26.6.36:9300",
      "node_attributes": {
        "xpack.installed": "true",
        "transform.config_version": "10.0.0",
        "ml.config_version": "12.0.0",
        "ml.max_jvm_size": "16827547648",
        "ml.allocated_processors_double": "4.0",
        "ml.allocated_processors": "4",
        "ml.machine_memory": "33651851264"
      },
      "roles": [
        "data",
        "data_cold",
        "data_content",
        "data_frozen",
        "data_hot",
        "data_warm",
        "ingest",
        "master",
        "ml",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision": "no",
      "deciders": [
        {
          "decider": "disk_threshold",
          "decision": "NO",
          "explanation": "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], having less than the minimum required [88.4gb] free space, actual free: [63.7gb], actual used: [89.1%]"
        }
      ]
    },
    {
      "node_id": "ite1RopSQxaEXeFsv3thrw",
      "node_name": "elastic-2",
      "transport_address": "172.26.6.37:9300",
      "node_attributes": {
        "xpack.installed": "true",
        "transform.config_version": "10.0.0",
        "ml.config_version": "12.0.0",
        "ml.max_jvm_size": "16827547648",
        "ml.allocated_processors_double": "4.0",
        "ml.allocated_processors": "4",
        "ml.machine_memory": "33651851264"
      },
      "roles": [
        "data",
        "data_cold",
        "data_content",
        "data_frozen",
        "data_hot",
        "data_warm",
        "ingest",
        "master",
        "ml",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision": "no",
      "deciders": [
        {
          "decider": "disk_threshold",
          "decision": "NO",
          "explanation": "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], having less than the minimum required [88.4gb] free space, actual free: [53gb], actual used: [91%]"
        }
      ]
    }
  ]
}

The message is expected, there are 2 shards already moving to node 3, so all other indices will have to wait.

If you do not need it, then you can delete it.

1 Like

@leandrojmp Solved, thank you very much