Two data nodes: One node left, get stale shards and cluster status goes red
Hello.
Searched for this scenario, only get the hint that this could happen in situations, but i dont know how to solve it.
We're using ES 6.8.5. We have one master-node only, and two nodes with both roles (data and master). Our indices have all one replica. When one of the data node leaves the cluster (shutting down), the cluster status goes red. As i could see, it's because the active write indexes become stale.
GET _cluster/health?pretty
{
"cluster_name" : "graylog",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 423,
"active_shards" : 423,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 423,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 50.0
}
.
ES master Logs:
...
[delete30wg_605][1] marking unavailable shards as stale: [IFVdNVhCRgWhRqGugLQOaQ]
[delete30wg_605][0] marking unavailable shards as stale: [7RSklzg_Twqz2pzpq1yj_Q]
[delete30wg_605][3] marking unavailable shards as stale: [7SfPvq5ySKScNuZBCcbQPQ]
[delete30wg_605][2] marking unavailable shards as stale: [rAlUNv_fQtiOWMOZwbc6sw]
...
.
GET _cat/shards?pretty
delete30fw_605 3 p STARTED 6031863 1.9gb 10.0.137.13 Fqc3okF
delete30fw_605 3 r UNASSIGNED
delete30fw_605 1 p STARTED 6033632 2gb 10.0.137.13 Fqc3okF
delete30fw_605 1 r UNASSIGNED
delete30fw_605 2 p STARTED 6040161 1.9gb 10.0.137.13 Fqc3okF
delete30fw_605 2 r UNASSIGNED
delete30fw_605 0 p STARTED 6035036 1.9gb 10.0.137.13 Fqc3okF
delete30fw_605 0 r UNASSIGNED
.
GET _cluster/allocation/explain?pretty
{
"index" : "delete30fw_605",
"shard" : 0,
"primary" : true,
"current_state" : "started",
"current_node" : {
"id" : "Fqc3okFAR066rkXY3lSn6Q",
"name" : "Fqc3okF",
"transport_address" : "10.0.137.13:9300",
"attributes" : {
"ml.machine_memory" : "67197956096",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true",
"ml.enabled" : "true"
},
"weight_ranking" : 1
},
"can_remain_on_current_node" : "yes",
"can_rebalance_cluster" : "no",
"can_rebalance_cluster_decisions" : [
{
"decider" : "rebalance_only_when_active",
"decision" : "NO",
"explanation" : "rebalancing is not allowed until all replicas in the cluster are active"
},
{
"decider" : "cluster_rebalance",
"decision" : "NO",
"explanation" : "the cluster has unassigned shards and cluster setting [cluster.routing.allocation.allow_rebalance] is set to [indices_all_active]"
}
],
"can_rebalance_to_other_node" : "no",
"rebalance_explanation" : "rebalancing is not allowed"
}
Should is set cluster.routing.allocation.allow_rebalance on "indices_primaries_active" or "always"? Does rebalacing works with only two data nodes? I would assume that the cluster status goes to yellow when one of the data-nodes fails.
Thanks