Changed node port - now it doesn't have shards, but storage is still used

I've setup a 6-node Elasticsearch 7.17.0 cluster which has to serve a single read-only ~1TB index.

The nodes have 2TB of storage, so I've set that index to be a single shard, with 5 replicas (so each node gets a full copy of the index).

Both the cluster and the index were green.

Today, I've changed the port in which one of the nodes (cluster-node-24) listens. Due to some early tests, the node was listening on port 9201, and I've just moved it back to 9200.

The thing is - the cluster is now yellow. The allocation explain API says that it's trying to allocate a shard because it's node left, but it can't since no node has enough storage available:

$ curl 'localhost:9200/_cluster/allocation/explain?pretty'
{
  "note" : "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
  "index" : "the_target_index",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2022-06-16T19:20:42.459Z",
    "details" : "node_left [d1KGHbo6RQaHLpBNOU6GnQ]",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "HQhro8j3ReCHv3LAgUF-Bw",
      "node_name" : "cluster-node-26",
      "transport_address" : "192.168.244.126:9300",
      "node_attributes" : {
        "ml.machine_memory" : "135088414720",
        "ml.max_open_jobs" : "512",
        "xpack.installed" : "true",
        "ml.max_jvm_size" : "33285996544",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[the_target_index][0], node[HQhro8j3ReCHv3LAgUF-Bw], [P], s[STARTED], a[id=f5yp09RHSYKNTHa1SQjKSg]]"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "allocating the shard to this node will bring the node above the high watermark cluster setting [cluster.routing.allocation.disk.watermark.high=90%] and cause it to have less than the minimum required [0b] of free space (free: [509.4gb], estimated shard size: [1.1tb])"
        }
      ]
    },
    {
      "node_id" : "d1KGHbo6RQaHLpBNOU6GnQ",
      "node_name" : "cluster-node-24",
      "transport_address" : "192.168.244.124:9300",
      "node_attributes" : {
        "ml.machine_memory" : "135082483712",
        "ml.max_open_jobs" : "512",
        "xpack.installed" : "true",
        "ml.max_jvm_size" : "33285996544",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "allocating the shard to this node will bring the node above the high watermark cluster setting [cluster.routing.allocation.disk.watermark.high=90%] and cause it to have less than the minimum required [0b] of free space (free: [511.2gb], estimated shard size: [1.1tb])"
        }
      ]
    },
    {
      "node_id" : "iDzDo9hfSUW3N2uJo_14Qg",
      "node_name" : "cluster-node-21",
      "transport_address" : "192.168.244.121:9300",
      "node_attributes" : {
        "ml.machine_memory" : "135082483712",
        "ml.max_open_jobs" : "512",
        "xpack.installed" : "true",
        "ml.max_jvm_size" : "33285996544",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[the_target_index][0], node[iDzDo9hfSUW3N2uJo_14Qg], [R], s[STARTED], a[id=4g1YYexaSeKXTTibGDyRhQ]]"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "allocating the shard to this node will bring the node above the high watermark cluster setting [cluster.routing.allocation.disk.watermark.high=90%] and cause it to have less than the minimum required [0b] of free space (free: [511.7gb], estimated shard size: [1.1tb])"
        }
      ]
    },
    {
      "node_id" : "lwB2cT2JRfe9CTv2KF6CNw",
      "node_name" : "cluster-node-23",
      "transport_address" : "192.168.244.123:9300",
      "node_attributes" : {
        "ml.machine_memory" : "135090417664",
        "ml.max_open_jobs" : "512",
        "xpack.installed" : "true",
        "ml.max_jvm_size" : "33285996544",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[the_target_index][0], node[lwB2cT2JRfe9CTv2KF6CNw], [R], s[STARTED], a[id=lzitYi11SESnWl4IY6awsA]]"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "allocating the shard to this node will bring the node above the high watermark cluster setting [cluster.routing.allocation.disk.watermark.high=90%] and cause it to have less than the minimum required [0b] of free space (free: [507.9gb], estimated shard size: [1.1tb])"
        }
      ]
    },
    {
      "node_id" : "wDCIUfW_QqSJbiz8FwFSOg",
      "node_name" : "cluster-node-22",
      "transport_address" : "192.168.244.122:9300",
      "node_attributes" : {
        "ml.machine_memory" : "135082483712",
        "ml.max_open_jobs" : "512",
        "xpack.installed" : "true",
        "ml.max_jvm_size" : "33285996544",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[the_target_index][0], node[wDCIUfW_QqSJbiz8FwFSOg], [R], s[STARTED], a[id=_HbW6DPYRBm9FrHvgQk5vA]]"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "allocating the shard to this node will bring the node above the high watermark cluster setting [cluster.routing.allocation.disk.watermark.high=90%] and cause it to have less than the minimum required [0b] of free space (free: [511.3gb], estimated shard size: [1.1tb])"
        }
      ]
    },
    {
      "node_id" : "wa1qscSTTiCPgUgw-O8TxA",
      "node_name" : "cluster-node-25",
      "transport_address" : "192.168.244.125:9300",
      "node_attributes" : {
        "ml.machine_memory" : "135082483712",
        "ml.max_open_jobs" : "512",
        "xpack.installed" : "true",
        "ml.max_jvm_size" : "33285996544",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[the_target_index][0], node[wa1qscSTTiCPgUgw-O8TxA], [R], s[STARTED], a[id=6p2bttpzRl-PHfCifwIdiA]]"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "allocating the shard to this node will bring the node above the high watermark cluster setting [cluster.routing.allocation.disk.watermark.high=90%] and cause it to have less than the minimum required [0b] of free space (free: [513gb], estimated shard size: [1.1tb])"
        }
      ]
    }
  ]
}

This is the list of current allocations:

$ curl -X GET "localhost:9200/_cat/allocation?v=true&h=node,shards,disk.*&pretty"
node           shards disk.indices disk.used disk.avail disk.total disk.percent
cluster-node-22      5        1.1tb     1.2tb    511.3gb      1.7tb           70
cluster-node-25      5        1.1tb     1.2tb      513gb      1.7tb           70
cluster-node-26      5        1.1tb     1.2tb    509.4gb      1.7tb           70
cluster-node-21      5        1.1tb     1.2tb    511.7gb      1.7tb           70
cluster-node-24      0           0b     1.2tb    511.2gb      1.7tb           70
cluster-node-23      5        1.1tb     1.2tb    507.9gb      1.7tb           71
UNASSIGNED          1                                                          

The cluster-node-24 has the same ID as before (compare with the NODE_LEFT error in the explain above):

$ curl -X GET "localhost:9200/_cat/nodes?v&h=name,id&full_id=true"
name            id
cluster-node-21 iDzDo9hfSUW3N2uJo_14Qg
cluster-node-23 lwB2cT2JRfe9CTv2KF6CNw
cluster-node-25 wa1qscSTTiCPgUgw-O8TxA
cluster-node-26 HQhro8j3ReCHv3LAgUF-Bw
cluster-node-24 d1KGHbo6RQaHLpBNOU6GnQ
cluster-node-22 wDCIUfW_QqSJbiz8FwFSOg

Here's the list of indices I have:

$ curl -X GET "localhost:9200/_cat/indices?v"
health status index                 uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases      wqmlJW5zQ6ePLv5Noc8Zdw   1   1         41           41     77.8mb         38.9mb
yellow open   the_target_index      Mb2ZIRXbQO6RTeeMT4KVeQ   1   5  270722174            0      5.5tb          1.1tb
green  open   other_index           kpzkVtJcSviNxsytfgYaTw   1   1     277223            0     15.8gb          7.9gb
green  open   other_index_     2    Lxl58z9WSb2ooHuTM3hdpA   1   1          2            0    195.1kb         97.5kb
green  open   .tasks                Ry1PYtMQS7CK1qyaz4Oeww   1   1         11            0    140.6kb         70.3kb

And I can clearly see in the cluster-node-24 that it still has the index data's stored:

sudo du -h /usr/share/elasticsearch/data/nodes/0/indices/Mb2ZIRXbQO6RTeeMT4KVeQ | tail -n1
1.2T    /usr/share/elasticsearch/data/nodes/0/indices/Mb2ZIRXbQO6RTeeMT4KVeQ

I'd expect the node to keep being "the same" (have the same ID, so no need to reallocate anything), or to somehow notice that the shard that's trying to get allocated is the one that the node already has (so say "I'll get this one" and be instantly ready), or at least for the node to notice it has a lot of stored data that it's not using (since the node thinks it has no shards assigned) and delete the disk space - and then sync the shard again.

But none of this seems to be happening.

Am I missing something? How do I fix this?

My current best alternative is to decrease the replica count to 4, so the index gets green again, and then increase it back to 5 so the cluster-node-24 gets a replica again. But this plan kinds of assumes that the node will eventually notice that it has a lot of stored data that's not using - which I'm not sure if will happen at all.

Another alternative will be to manually delete that data directory - but I don't know which potential issues could this cause.

Moving the node back to the 9201 port didn't fix the issue - it's still showing basically the same output on all of these commands.

(I've redacted the nodes & indices names, but none of the IDs)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.