Cluster health RED, UNASSIGNED shards from CLUSTER_RECOVERED

I have a elasticsearch with 3 nodes, first one is master, and all are data nodes. I create a snapshot of first node and created new vm in different data center (backed by openstack), and copied the elasticsearch data directory. I can start the elasticsearch, access kibana, but some data are UNASSIGNED .
Here are some of the output from few commands i tried.

curl -XGET http://search01:9200/_cat/shards | grep UNASSIGNED | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 18478  100 18478    0     0  61492      0 --:--:-- --:--:-- --:--:-- 61593
58

ubuntu@search01:~$ curl -XGET http://localhost:9200/_cluster/allocation/explain?pretty
{
  "index" : "mmkg-doc-nst02-000003",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "CLUSTER_RECOVERED",
    "at" : "2018-04-25T01:48:14.631Z",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
  "node_allocation_decisions" : [
    {
      "node_id" : "-AMMeohmQKemcPtvHWwoLQ",
      "node_name" : "search01",
      "transport_address" : "search01:9300",
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    },
    {
      "node_id" : "qoFEyZQcTh2ppPpO23uB0w",
      "node_name" : "search03",
      "transport_address" : "search03:9300",
      "node_decision" : "no",
      "store" : {
        "in_sync" : false,
        "allocation_id" : "sHfaufsdRRKf0jYi3vNf_Q"
      }
    },
    {
      "node_id" : "vyeFBQUOSterpgmlCicWVg",
      "node_name" : "search02",
      "transport_address" : "search02:9300",
      "node_decision" : "no",
      "store" : {
        "in_sync" : false,
        "allocation_id" : "KX5PPj8yTHyJG-UE6_JaLA"
      }
    }
  ]
}

How to recover the data?
Note: I replaced the ips with hostname in "transport_address" : "search02:9300" for security.

I take it you took a VM snapshot and not an elasticsearch snapshot as per here: Snapshot module | Elasticsearch Guide [8.11] | Elastic

If so then see the above allocate_explanation reason. If your first cluster is alive then seek to take a full cluster snapshot and restore that into the other DC cluster.

Hi @JKhondhu, I took VM snapshot of one of the 3 node (master node), and transfered data manually from elasticsearch data directory. I just need to move this one node.

@bistaumanga
Yeah, VM snapshots of the vm it self are cool but as the Elasticsearch shards (Lucene segments) reside on the file system as immutable objects, a VM snapshot will not capture the segments that are being updated accordingly. Therefor you have the explanation of either stale or corrupt

If your first cluster is alive and kicking then seek to take a full cluster snapshot and restore that into the other DC cluster.

@JKhondhu, Thanks. I'll try it.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.