Restored indices in red

Hi,
I just restored the ES snapshot, but however I get all the restored indices in red status.
When I call GET _cluster/allocation/explain , I get the following:

{
  "index" : "componentinformation",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NEW_INDEX_RESTORED",
    "at" : "2022-10-20T09:15:03.302Z",
    "details" : "restore_source[rep_2/monthly-snapshot-2022.07.01]",
    "last_allocation_status" : "no"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "cx52o2FQRnmCJ4ReU8xCUQ",
      "node_name" : "node-ELK3",
      "transport_address" : "10.116.39.197:9300",
      "node_attributes" : {
        "rack" : "r1c",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "restore_in_progress",
          "decision" : "NO",
          "explanation" : "shard has failed to be restored from the snapshot [rep_2:monthly-snapshot-2022.07.01/CbcV4EviSt-Hg3pDILGUHQ] because of [restore_source[rep_2/monthly-snapshot-2022.07.01]] - manually close or delete the index [aoi-componentinformation-20220520] in order to retry to restore the snapshot again or use the reroute API to force the allocation of an empty primary shard"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [9.270049883195354%]"
        }
      ]
    },
    {
      "node_id" : "nQtgMGm8RjGJkrYmOH9lLw",
      "node_name" : "datanode-1",
      "transport_address" : "10.116.37.151:9300",
      "node_attributes" : {
        "rack" : "r1a",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "weight_ranking" : 2,
      "deciders" : [
        {
          "decider" : "restore_in_progress",
          "decision" : "NO",
          "explanation" : "shard has failed to be restored from the snapshot [rep_2:monthly-snapshot-2022.07.01/CbcV4EviSt-Hg3pDILGUHQ] because of [restore_source[rep_2/monthly-snapshot-2022.07.01]] - manually close or delete the index [componentinformation] in order to retry to restore the snapshot again or use the reroute API to force the allocation of an empty primary shard"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [3.7204337764371793%]"
        }
      ]
    },
    {
      "node_id" : "AlnFm0gKRtit-rYm2jFfFA",
      "node_name" : "node-ELK2",
      "transport_address" : "10.116.37.201:9300",
      "node_attributes" : {
        "rack" : "r1a",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "weight_ranking" : 3,
      "deciders" : [
        {
          "decider" : "restore_in_progress",
          "decision" : "NO",
          "explanation" : "shard has failed to be restored from the snapshot [rep_2:monthly-snapshot-2022.07.01/CbcV4EviSt-Hg3pDILGUHQ] because of [restore_source[rep_2/monthly-snapshot-2022.07.01]] - manually close or delete the index [aoi-componentinformation-20220520] in order to retry to restore the snapshot again or use the reroute API to force the allocation of an empty primary shard"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [14.17961261243759%]"
        }
      ]
    },
    {
      "node_id" : "2qfDaKt9Rgei_3hvscmIrA",
      "node_name" : "datanode-2",
      "transport_address" : "10.116.39.168:9300",
      "node_attributes" : {
        "rack" : "r1c",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "weight_ranking" : 4,
      "deciders" : [
        {
          "decider" : "restore_in_progress",
          "decision" : "NO",
          "explanation" : "shard has failed to be restored from the snapshot [rep_2:monthly-snapshot-2022.07.01/CbcV4EviSt-Hg3pDILGUHQ] because of [restore_source[rep_2/monthly-snapshot-2022.07.01g]] - manually close or delete the index [componentinformation] in order o retry to restore the snapshot again or use the reroute API to force the allocation of an empty primary shard"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [7.377034012598388%]"
        }
      ]
    },
    {
      "node_id" : "b4UtklaXQmuycSgwlikE7Q",
      "node_name" : "nodel-ELK4",
      "transport_address" : "10.116.38.189:9300",
      "node_attributes" : {
        "rack" : "r1a",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "weight_ranking" : 5,
      "deciders" : [
        {
          "decider" : "restore_in_progress",
          "decision" : "NO",
          "explanation" : "shard has failed to be restored from the snapshot [rep_2:monthly-snapshot-2022.07.01-nxzygk4itqmizhyld6aoeg/CbcV4EviSt-Hg3pDILGUHQ] because of [restore_source[rep_2/monthly-snapshot-2022.07.01]] - manually close or delete the index [componentinformation] in order to retry to restore the snapshot again or use the reroute API to force the allocation of an empty primary shard"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [12.035006081673684%]"
        }
      ]
    }
  ]
}

We are hosting solition on EC2 cluster, do you think adding more memory will resolve the issue?
Or it is something related with ES/jvm configs?

Check the response of the allocation explain:

"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes"

This means that none of your nodes cannot allocate these shards, from here you need to check the deciders for each node.

You will see that each one of your nodes has this as one of the deciders.

        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [9.270049883195354%]"
        }

It means that all of your nodes reached the low disk watermark, which happens when you use more than 85% of the disk space, when this happens elasticsearch stops allocating shards in the node, so you won't be able to create new shards in those nodes until you free-up some space.

Elasticsearch has 3 disk watermarks.

  • low: default to 85%, will stop allocating shards.
  • high: default to 90%, will try to move shards out of the node.
  • flood: default to 95%, will set every index that has shards on the node to read-only.

Sometimes those percents will make you waste a lot of space, for example, on a 4 TB disk, 85% of usage means something near 3.4 GB of data, which would make around 600 GB of space to not being used, depending on your use case you can increase those defaults.

So, to solve your issue you need to free-up some space in your nodes by deleting old data or depending on the disk size, changing the triggers for the watermarks.

Hi,
Thank you so much for your reply.

I called _cat?allocation API and I get the following

14 17.4gb 26.5gb 2.8gb 29.3gb 90 10.116.37.  10.116.37.  datanode-1
16  9.9gb 45.9gb 3.1gb   49gb 93 10.116.39.  10.116.39.  datanode-2
16  3.3gb 48.4gb 636mb   49gb 98 10.116.39.  10.116.39.  node-ELK3
17 37.8gb 44.7gb 5.2gb 49.9gb 89 10.116.38.  10.116.38.  nodel-ELK4
14 31.6gb 41.7gb 7.3gb   49gb 85 10.116.37.  10.116.37.  node-ELK2

so there is definitely some space issue. However each of those AWS EC2 instances has attached 1100GiB volume. It looks like that volume is not being used at all. Is there anything with ES configuration can be done to use the attached volume?

That what I get from running lsblk

NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme0n1       259:1    0  150G  0 disk
|-nvme0n1p1   259:2    0   50G  0 part /
`-nvme0n1p128 259:3    0    1M  0 part
nvme1n1       259:0    0  1.1T  0 disk

The additional volume was not mounted. When it comes to EC2 there is good workflow on the following page: https://4sysops.com/archives/automatically-mount-an-nvme-ebs-volume-in-an-ec2-linux-instance-using-fstab/
The volume should be monted on the data directory

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.