RED Status Cluster - UNASSIGNED CLUSTER_RECOVERED


#1

Hi,

I have read many resources on red cluster, but I would really appreciate it if someone can tell me what to do here, as I'm stuck at this point not knowing what could work.

***Please note I took out cluster status due to permitted number of characters.

Lots of "[0] primary shard is not active Timeout: [1m]" in the logs

    [root@esclient-1 ~]# curl -XGET '*******:9200/_cat/shards?v'
    index                       shard prirep state         docs   store ip          node
    logs-2017.08.02   2     p      STARTED     218084 118.3mb 192.12.0.12 esdata-1
    logs-2017.08.02   2     r      UNASSIGNED                             
    logs-2017.08.02   1     p      STARTED     217879 118.9mb 192.12.0.12 esdata-1
    logs-2017.08.02   1     r      UNASSIGNED                             
    logs-2017.08.02   3     p      STARTED     217081 118.2mb 192.12.0.10 esdata-2
    logs-2017.08.02   3     r      UNASSIGNED                             
    logs-2017.08.02   0     p      STARTED     217307   118mb 192.12.0.15 esdata-3
    logs-2017.08.02   0     r      UNASSIGNED                             
    logs-2017.07.24   1     p      STARTED     492214   275mb 192.12.0.12 esdata-1
    logs-2017.07.24   1     r      UNASSIGNED                             
    logs-2017.07.24   0     p      UNASSIGNED                             
    logs-2017.07.24   0     r      UNASSIGNED                             
    logs-2017.07.17   1     p      UNASSIGNED                             
    logs-2017.07.17   1     r      UNASSIGNED                             
       *Truncated*
    [root@esclient-1 ~]# curl -XGET '*******:9200/_cluster/allocation/explain?pretty'
    {
      "index" : ".monitoring-logstash-2-2017.08.04",
      "shard" : 0,
      "primary" : true,
      "current_state" : "unassigned",
      "unassigned_info" : {
        "reason" : "CLUSTER_RECOVERED",
        "at" : "2017-08-08T15:19:43.817Z",
        "last_allocation_status" : "no_valid_shard_copy"
      },
      "can_allocate" : "no_valid_shard_copy",
      "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
      "node_allocation_decisions" : [
        {
          "node_id" : "52Oy-hczTdObNr7HyoGQYg",
          "node_name" : "esdata-4",
          "transport_address" : "*******:9300",
          "node_attributes" : {
            "ml.enabled" : "true"
          },
          "node_decision" : "no",
          "store" : {
            "found" : false
          }
        },
        {
          "node_id" : "SeMHhljWRpuT76bHNlLqwA",
          "node_name" : "esdata-3",
          "transport_address" : "*******:9300",
          "node_attributes" : {
            "ml.enabled" : "true"
          },
          "node_decision" : "no",
          "store" : {
            "in_sync" : false,
            "allocation_id" : "sBGrm9kmTFKW_ltuNJpDkg"
          }
        },
        {
          "node_id" : "c02j74dYSoOgA8KYCzvbWQ",
          "node_name" : "esdata-1",
          "transport_address" : "*******:9300",
          "node_attributes" : {
            "ml.enabled" : "true"
          },
          "node_decision" : "no",
          "store" : {
            "found" : false
          }
        },
        {
          "node_id" : "i52cA3MGSE2g9Fx92Inrbg",
          "node_name" : "esdata-2",
          "transport_address" : "*******:9300",
          "node_attributes" : {
            "ml.enabled" : "true"
          },
          "node_decision" : "no",
          "store" : {
            "in_sync" : false,
            "allocation_id" : "tQmXydOTTd-cez8A51MZ4A"
          }
        }
      ]
    }
[root@esclient-1 ~]# curl -XGET '******:9200/_cluster/allocation/explain?pretty'
{
  "index" : ".monitoring-data-2",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "INDEX_CREATED",
    "at" : "2017-08-08T15:53:40.175Z",
    "last_allocation_status" : "no"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "52Oy-hczTdObNr7HyoGQYg",
      "node_name" : "esdata-4",
      "transport_address" : "*******:9300",
      "node_attributes" : {
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "enable",
          "decision" : "NO",
          "explanation" : "no allocations are allowed due to {}"
        }
      ]
    },
    {
      "node_id" : "i52cA3MGSE2g9Fx92Inrbg",
      "node_name" : "esdata-2",
      "transport_address" : "*******:9300",
      "node_attributes" : {
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "weight_ranking" : 2,
      "deciders" : [
        {
          "decider" : "enable",
          "decision" : "NO",
          "explanation" : "no allocations are allowed due to {}"
        }
      ]
    },
    {
      "node_id" : "SeMHhljWRpuT76bHNlLqwA",
      "node_name" : "esdata-3",
      "transport_address" : "*******:9300",
      "node_attributes" : {
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "weight_ranking" : 3,
      "deciders" : [
        {
          "decider" : "enable",
          "decision" : "NO",
          "explanation" : "no allocations are allowed due to {}"
        }
      ]
    },
    {
      "node_id" : "c02j74dYSoOgA8KYCzvbWQ",
      "node_name" : "esdata-1",
      "transport_address" : "*******:9300",
      "node_attributes" : {
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "weight_ranking" : 4,
      "deciders" : [
        {
          "decider" : "enable",
          "decision" : "NO",
          "explanation" : "no allocations are allowed due to {}"
        }
      ]
    }
  ]
}

Move things/reroute one by one ? Or is it possible to do something globally i.e. cluster setting or ES setting to fix all unassigned on once.
Can I issue a re-balance , will that help ?


#2

Here is the cluster health status:

[root@esclient-1 ~]# curl -XGET '********:9200/_cluster/health?pretty'
{
  "cluster_name" : "********",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 8,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 33,
  "active_shards" : 33,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 105,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 23.91304347826087
}

Used "index.routing.allocation.exclude._name": null but I don't see the shard re-allocating. Do I need to restart anything ?

Also, used re-route api.

The above two will make the explain API skip to the next issue but I dont see the index being allocated anywhere.

To minimize the work I have deleted all ".monitoring" old indices as they are not important for us at this point, and many of them are in red status, due to the above.

Version is 5.4


(Mark Walkom) #3

What version are you on?


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.