Unassigned Shards, what to do about it

Hello Guys,

first off, our Cluster:
Master (Kibana)
-8GB
-Appache Web Server
-Does not filter/query

Workernode#01/#02
-32GB Ram each
-No Kibana

Soo my cluster is running about fine, there's just this thing that the Health under "Monitoring" is Yellow, since i have around 50% unassigned shards. What do i have to do about it? My predecessor set the default Shards to:
-5 Primary Shards
-1 replica
Which is way to much if i understand this right i should go with 2 Primary and 1 replica, but i struggle to find where i can fix this.

Any other Ideas why there are so many unassigned shards?

Greetz
Mo

That should work even with 5 shards and 1 replica.

It could be something like node.attr's, like rack awareness. Do you see any node.attr options in elastisearch.yml? If so, what are the settings on both nodes?

Pick one index with unassigned shards, make sure it doesn't have replica = 2 or more.

Add custom attributes to the node:

#node.attr.rack: r1

----------------------------------- Paths ------------------------------------

Well i guess it's left as default ? It's like this on both sides.

Hi,

Did you read the logs ? When a shard cant be assign its possible that something has happened to these shards or with the cluster in the allocation moment.

Also, you can see which shards dont have been assigned with the command below:

CLI: curl -XGET "<IP>:9200/_cat/shards" | grep "UNASSIGNED"
DEV Tools: GET /_cat/shards

There are many reasons why shards might be unassigned, and it is rare for the logs to contain much useful information. The right way to diagnose the reason for an unassigned shard is to use the allocation explain API.

@Silen_logs @DavidTurner First of all, thanks both of you for the hints & tips.

After using the allocation explain API, i got various reasons back why the shards cannot be assigned:

the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [13.226582219832771%] --> I've upgraded the Disk with 250gb more, and set the Watermark rules to 80%.

the shard cannot be allocated to the same node on which a copy of the shard already exists --> i don't really know what to about that? Any clues?

cannot allocate because allocation is not permitted to any of the nodes --> I guess i'll find this in some .yml /conf file to change?

Thanks for your help guys, appreciate it!

Greetz
Moritz

Are all your Elasticsearch nodes running exactly the same version?

It's quite hard to help from just the few small messages that you've picked out by hand. Please share the whole output.

That's normal. Elasticsearch won't put more than one copy of a shard on each node.

Ok, add more disk space, delete some data, or adjust the watermark settings. NB cluster.routing.allocation.disk.watermark.low=85% means the limit is when the disk is 85% full, so reducing the watermark to 80% is making the problem worse.

This is a summary of the more detailed information elsewhere in the output.

Ok, here is the whole output..

{
  "index" : "filebeat-6.8.1-2019.07.30",
  "shard" : 1,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2019-07-30T11:37:45.122Z",
    "details" : "node_left [LZyAKlAfS-mNxvdjuaEwUg]",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "LZyAKlAfS-mNxvdjuaEwUg",
      "node_name" : "LZyAKlA",
      "transport_address" : "X.X.X.223:9300",
      "node_attributes" : {
        "ml.machine_memory" : "25111109632",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[filebeat-6.8.1-2019.07.30][1], node[LZyAKlAfS-mNxvdjuaEwUg], [P], s[STARTED], a[id=bW1xHKZcRPOlDzLLDDEKXA]]"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [14.997178982365764%]"
        }
      ]
    },
    {
      "node_id" : "LdEwqP_YRiWJRWA1_UyIgg",
      "node_name" : "LdEwqP_",
      "transport_address" : "X.X.X.222:9300",
      "node_attributes" : {
        "ml.machine_memory" : "25111093248",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [13.211407213469084%]"
        }
      ]
    }
  ]
}

Yes, on both nodes, it it Version 6.8.1
Node#01
Installed Packages
Name : elasticsearch
Arch : noarch
Version : 6.8.1
Release : 1
Size : 227 M
Repo : installed
From repo : elastic-6.x
Node#02
Installed Packages
Name : elasticsearch
Arch : noarch
Version : 6.8.1
Release : 1
Size : 227 M
Repo : installed
From repo : elastic-6.x

Thanks, that helps. I reformatted it for you using the </> button to make it easier to read.

You have two data nodes and their disks are both over 85% full, so no replicas can be allocated in this cluster.

Sorry about that! Will do </> in the future.
Thanks a lot, i'll upgrade the Disks and hope this Problem will clear out.
Will mark you answer properly by tomorrow!
Thanks again.

Little Update : I noticed that ES had troubles with allocating replica shards, because it thought that exact the same replica would already be on that node. Reassigned the Replicas (long process) and right now i'm at 200 unassigned shards, and it's going down with every index checked. I don't know yet if this is the final solution, since i have not improved my disk space yet.