Two data nodes: One node left, get stale shards and cluster status goes red

itchy91 · December 19, 2019, 5:52am

Hello.

Searched for this scenario, only get the hint that this could happen in situations, but i dont know how to solve it.

We're using ES 6.8.5. We have one master-node only, and two nodes with both roles (data and master). Our indices have all one replica. When one of the data node leaves the cluster (shutting down), the cluster status goes red. As i could see, it's because the active write indexes become stale.

GET _cluster/health?pretty

{
  "cluster_name" : "graylog",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 423,
  "active_shards" : 423,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 423,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 50.0
}

.

ES master Logs:

...
[delete30wg_605][1] marking unavailable shards as stale: [IFVdNVhCRgWhRqGugLQOaQ]
[delete30wg_605][0] marking unavailable shards as stale: [7RSklzg_Twqz2pzpq1yj_Q]
[delete30wg_605][3] marking unavailable shards as stale: [7SfPvq5ySKScNuZBCcbQPQ]
[delete30wg_605][2] marking unavailable shards as stale: [rAlUNv_fQtiOWMOZwbc6sw]
...

.

GET _cat/shards?pretty

delete30fw_605     3 p STARTED     6031863    1.9gb 10.0.137.13 Fqc3okF
delete30fw_605     3 r UNASSIGNED                               
delete30fw_605     1 p STARTED     6033632      2gb 10.0.137.13 Fqc3okF
delete30fw_605     1 r UNASSIGNED                               
delete30fw_605     2 p STARTED     6040161    1.9gb 10.0.137.13 Fqc3okF
delete30fw_605     2 r UNASSIGNED                               
delete30fw_605     0 p STARTED     6035036    1.9gb 10.0.137.13 Fqc3okF
delete30fw_605     0 r UNASSIGNED

.

GET _cluster/allocation/explain?pretty

{
  "index" : "delete30fw_605",
  "shard" : 0,
  "primary" : true,
  "current_state" : "started",
  "current_node" : {
    "id" : "Fqc3okFAR066rkXY3lSn6Q",
    "name" : "Fqc3okF",
    "transport_address" : "10.0.137.13:9300",
    "attributes" : {
      "ml.machine_memory" : "67197956096",
      "ml.max_open_jobs" : "20",
      "xpack.installed" : "true",
      "ml.enabled" : "true"
    },
    "weight_ranking" : 1
  },
  "can_remain_on_current_node" : "yes",
  "can_rebalance_cluster" : "no",
  "can_rebalance_cluster_decisions" : [
    {
      "decider" : "rebalance_only_when_active",
      "decision" : "NO",
      "explanation" : "rebalancing is not allowed until all replicas in the cluster are active"
    },
    {
      "decider" : "cluster_rebalance",
      "decision" : "NO",
      "explanation" : "the cluster has unassigned shards and cluster setting [cluster.routing.allocation.allow_rebalance] is set to [indices_all_active]"
    }
  ],
  "can_rebalance_to_other_node" : "no",
  "rebalance_explanation" : "rebalancing is not allowed"
}

Should is set cluster.routing.allocation.allow_rebalance on "indices_primaries_active" or "always"? Does rebalacing works with only two data nodes? I would assume that the cluster status goes to yellow when one of the data-nodes fails.

Thanks

DavidTurner · December 19, 2019, 8:20pm

The cluster health is red so there is at least one unassigned primary shard. You need to focus your attention on that. The excerpts shared above show only assigned primaries.

itchy91 · December 20, 2019, 5:45am

Thanks for the hint. It's true. There was a build in indice in graylog which comes without replica in the last update. Now everything works.

system · January 17, 2020, 5:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch losing shards Elasticsearch	5	613	January 2, 2017
3 indices never turns to yellow or green? Elasticsearch	6	2655	July 5, 2017
Cluster in Red status: what about write & delete operations? Elasticsearch	3	4334	July 5, 2017
ES cluster goes into red frequently Elasticsearch	21	2543	November 29, 2019
3 nodes ES 2.3.2 cluster with Replica 2 goes to red state after bringing down whole cluster and starting only a single node Elasticsearch	5	855	June 22, 2017

Two data nodes: One node left, get stale shards and cluster status goes red

Related topics