Cluster health RED, UNASSIGNED shards from CLUSTER_RECOVERED

bistaumanga · April 25, 2018, 2:04am

I have a elasticsearch with 3 nodes, first one is master, and all are data nodes. I create a snapshot of first node and created new vm in different data center (backed by openstack), and copied the elasticsearch data directory. I can start the elasticsearch, access kibana, but some data are UNASSIGNED .
Here are some of the output from few commands i tried.

curl -XGET http://search01:9200/_cat/shards | grep UNASSIGNED | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 18478  100 18478    0     0  61492      0 --:--:-- --:--:-- --:--:-- 61593
58

ubuntu@search01:~$ curl -XGET http://localhost:9200/_cluster/allocation/explain?pretty
{
  "index" : "mmkg-doc-nst02-000003",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "CLUSTER_RECOVERED",
    "at" : "2018-04-25T01:48:14.631Z",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
  "node_allocation_decisions" : [
    {
      "node_id" : "-AMMeohmQKemcPtvHWwoLQ",
      "node_name" : "search01",
      "transport_address" : "search01:9300",
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    },
    {
      "node_id" : "qoFEyZQcTh2ppPpO23uB0w",
      "node_name" : "search03",
      "transport_address" : "search03:9300",
      "node_decision" : "no",
      "store" : {
        "in_sync" : false,
        "allocation_id" : "sHfaufsdRRKf0jYi3vNf_Q"
      }
    },
    {
      "node_id" : "vyeFBQUOSterpgmlCicWVg",
      "node_name" : "search02",
      "transport_address" : "search02:9300",
      "node_decision" : "no",
      "store" : {
        "in_sync" : false,
        "allocation_id" : "KX5PPj8yTHyJG-UE6_JaLA"
      }
    }
  ]
}

How to recover the data?
Note: I replaced the ips with hostname in "transport_address" : "search02:9300" for security.

JKhondhu · April 26, 2018, 8:38pm

I take it you took a VM snapshot and not an elasticsearch snapshot as per here: Snapshot module | Elasticsearch Guide [8.11] | Elastic

If so then see the above allocate_explanation reason. If your first cluster is alive then seek to take a full cluster snapshot and restore that into the other DC cluster.

bistaumanga · April 26, 2018, 11:27pm

Hi @JKhondhu, I took VM snapshot of one of the 3 node (master node), and transfered data manually from elasticsearch data directory. I just need to move this one node.

JKhondhu · April 27, 2018, 6:56am

@bistaumanga
Yeah, VM snapshots of the vm it self are cool but as the Elasticsearch shards (Lucene segments) reside on the file system as immutable objects, a VM snapshot will not capture the segments that are being updated accordingly. Therefor you have the explanation of either stale or corrupt

If your first cluster is alive and kicking then seek to take a full cluster snapshot and restore that into the other DC cluster.

bistaumanga · May 4, 2018, 3:34am

@JKhondhu, Thanks. I'll try it.

system · June 1, 2018, 3:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unassigned shards on cluster restart Elasticsearch	1	693	October 2, 2018
Shard unassigned Elasticsearch	3	337	July 14, 2020
Red cluster Elasticsearch	7	640	February 21, 2019
Unassigned Shard Elasticsearch	4	725	January 3, 2020
Unassigned shards - cannot allocate because all found copies of the shard are either stale or corrupt Elasticsearch	1	721	August 16, 2023

Cluster health RED, UNASSIGNED shards from CLUSTER_RECOVERED

Related topics