Yellow Cluster Health - Missing Replica Shards

Hello everyone,

I am continually having new indexes get created with a yellow status.

When I do a _cluster/allocation/explain I get this back

{
  "index" : "vnext-signalrhub-2021.10.05",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "INDEX_CREATED",
    "at" : "2021-10-05T00:25:09.539Z",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "-5bvSCI1SMuAw9FlyM1h2Q",
      "node_name" : "pelitas-data-0",
      "transport_address" : "172.19.20.24:9300",
      "node_attributes" : {
        "ml.machine_memory" : "12884901888",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[vnext-signalrhub-2021.10.05][0], node[-5bvSCI1SMuAw9FlyM1h2Q], [P], s[STARTED], a[id=0y7LEW6fSV6olnOkgGl7Qg]]"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=90%], using more disk space than the maximum allowed [90.0%], actual free: [6.694358221196546%]"
        }
      ]
    },
    {
      "node_id" : "YezYv3rdR8OoIIiB6MFSRA",
      "node_name" : "pelitas-data-1",
      "transport_address" : "172.19.20.56:9300",
      "node_attributes" : {
        "ml.machine_memory" : "12884901888",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "weight_ranking" : 2,
      "deciders" : [
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=90%], using more disk space than the maximum allowed [90.0%], actual free: [9.302611470128882%]"
        }
      ]
    }
  ]
}

I am relatively new to elastic and not exactly sure what I should do to address this ongoing issue.

Hi @David_Reck Welcome to the community.

Looks like there are a couple things going on.

Elasticsearch defaults to have 1 Primary and 1 Replica Shard for every shard unless you have defined a different behavior

Looks like you have a 2 node cluster

In short you are out of disk space on both nodes... Greater than 90% disk used.. so there is no where to to put the data.

You can see the nodes and their disk usage using this command in Kibana Dev Tools

GET /_cat/nodes/?v&h=name,du,dt,dup,hp,hc,rm,rp,r

You can list your indices by Size with this command

GET /_cat/indices/*?v&s=pri.store.size:desc

Your either going to need

  • Cleanup some indices
  • Set Replicas to 0 and risk data lost
  • Or Make Your Nodes larger in terms of disk space

and / or some combinations of the above

Thanks a bunch for the direction Stephen.

Based on the options, I think I'd like to head down the path of making the nodes larger in disk space. Or should I try to add another data node? I've tried to do that but I am getting error.

How do you increase the data size of a node?

Hi @David_Reck well you will need to increase the capacity of the disk / volumes that the Elasticsearch data resides on, that is really an OS Admin task. You will need to shut down the node, add space and restart. You will follow the rolling upgrade path (in this case you are not upgrading the Software but the HW) ... be carefull not to destroy / overwrite your data.

When restarting Elasticsearch should use / recognize what is available.

Adding another node is also a good choice. You could start a separate thread with the error you are getting and perhaps someone can help.