How to recover after bad rolling upgrade

Michael_Jackson1 · March 20, 2020, 6:10pm

I was following the rolling upgrade documentation and the first node I was trying to upgrade from 7.3 to 7.6 didn't upgrade. It still shows that I am on 7.3 and now my cluster health shows red.

Now I am seeing:

.kibana_task_manager 0 p UNASSIGNED CLUSTER_RECOVERED
.kibana_task_manager 0 r UNASSIGNED CLUSTER_RECOVERED

Running:

GET /_cluster/allocation/explain

Returns:

{
  "index" : ".kibana_task_manager",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "CLUSTER_RECOVERED",
    "at" : "2020-03-20T06:29:05.386Z",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
  "node_allocation_decisions" : [
    {
      "node_id" : "GLPJmH7-SBKATglJ4iu1IA",
      "node_name" : "node_01",
      "transport_address" : "XXX.XXX.XXX.XXX:9300",
      "node_attributes" : {
        "ml.machine_memory" : "8363737088",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "in_sync" : false,
        "allocation_id" : "CM-Un5daQOWv_CzekU0bRA"
      }
    },
    {
      "node_id" : "ZAKTP504TnujdrIM___Ljg",
      "node_name" : "node_02",
      "transport_address" : "XXX.XXX.XXX.XXX:9300",
      "node_attributes" : {
        "ml.machine_memory" : "8363724800",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    }
  ]
}

I have read some solutions, but am a little scared to start trying things. Any help or advice would be greatly appreciated.

DavidTurner · March 20, 2020, 6:19pm

Have you completed the upgrade? One of your nodes has a stale copy of this shard and the other has no copy at all, which indicates that there's another node out there that has the good copy of this shard.

DavidTurner · March 20, 2020, 6:20pm

Wait, CLUSTER_RECOVERED means that this cluster has experienced a full cluster restart, i.e. you restarted the master nodes. That doesn't happen in a rolling upgrade.

Michael_Jackson1 · March 20, 2020, 6:22pm

I stopped after seeing the RED status and started googling to see if I could find out why my node never recovered after restarting it. I did turn the cluster off last night and turned it back on this morning. I am just running it on a couple vm's.

DavidTurner · March 20, 2020, 7:05pm

Ok, you are in the situation described in the IMPORTANT note at the bottom of the rolling upgrade instructions:

If you stop half or more of the master-eligible nodes all at once during the upgrade then the cluster will become unavailable, meaning that the upgrade is no longer a rolling upgrade. If this happens, you should upgrade and restart all of the stopped master-eligible nodes to allow the cluster to form again, as if performing a full-cluster restart upgrade. It may also be necessary to upgrade all of the remaining old nodes before they can join the cluster after it re-forms.

system · April 17, 2020, 7:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster health RED, UNASSIGNED shards from CLUSTER_RECOVERED Elasticsearch	5	3209	June 1, 2018
CorruptIndexException: docs out of order Elasticsearch	0	71	July 4, 2024
Recovering a poor performing Elasticsearch install Elasticsearch	4	687	October 26, 2017
Shards unassigned for .kibana_task_manager index in cluster Kibana	4	4330	December 2, 2020
Problem with ".transform-internal" and ".kibana_task_manager_" index Elasticsearch	1	237	August 9, 2022

How to recover after bad rolling upgrade

Related Topics