Behavior of a failed snapshot restoration

J3rem1e · May 15, 2020, 9:10am

Hello,

I have two ES clusters X & Y.

One of this cluster (Y) is "read only", and I periodically synchronize this cluster by restoring a snapshot made in the first cluster (X). The restoration is done through a HTTP remote repository. Each indices have one shard and one replica

Sometimes, when a network failure occurs during the restoration, the cluster Y turn red because of unassigned shards. closing/reopening the failed indices doesn't work and the cluster is dead until I wait for the http repository to be up again and start a new restoration task.

the allocation-explain of an unassigned shard show a corrupted index (see below).
Is that the expected behavior of such failure ? Is there a way to keep the readonly cluster working with the last-seen-data until the connection to the remote repository is up again ?

thanks !
Jérémie

    ---
    explanation:
      index: "event-2020.05.15"
      shard: 0
      primary: true
      current_state: "unassigned"
      unassigned_info:
        reason: "INDEX_REOPENED"
        at: "2020-05-15T07:27:49.266Z"
        last_allocation_status: "no_valid_shard_copy"
      can_allocate: "no_valid_shard_copy"
      allocate_explanation: "cannot allocate because all found copies of the shard are\
        \ either stale or corrupt"
      node_allocation_decisions:
      - node_id: "5fIIsDsEQaa0C--jf0smzA"
        node_name: "stretch64-vm9"
        transport_address: "172.16.20.19:9300"
        node_decision: "no"
        store:
          in_sync: true
          allocation_id: "qyu9zRslRgiRY1SZuf9HpA"
          store_exception:
            type: "corrupt_index_exception"
            reason: "Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(SimpleFSIndexInput(path=\"\
              /var/elasticsearch/idx/nodes/0/indices/2wMoXs6-SyatKdgma6f1BA/0/index/segments_4\"\
              )))"
            caused_by:
              type: "no_such_file_exception"
              reason: "/var/elasticsearch/idx/nodes/0/indices/2wMoXs6-SyatKdgma6f1BA/0/index/_4.si"
      - node_id: "jL1We097QOWuAg6bhl5tvw"
        node_name: "stretch64-vm10"
        transport_address: "172.16.20.20:9300"
        node_decision: "no"
        store:
          in_sync: false
          allocation_id: "zqsc5Lo0Q0iT66SpAxEVEw"
      - node_id: "rt1Ga1ZeQJ2SuLqAvTbWpw"
        node_name: "stretch64-vm8"
        transport_address: "172.16.20.18:9300"
        node_decision: "no"
        store:
          found: false

system · June 12, 2020, 9:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Snapshot and restore Sharda failed issue Elasticsearch	7	2044	April 25, 2019
Cannot allocate because allocation is not permitted to any of the nodes Elasticsearch	6	14196	July 26, 2017
Cluster health RED, UNASSIGNED shards from CLUSTER_RECOVERED Elasticsearch	5	3300	June 1, 2018
SOLVED: Unassigned shards after restart, allocated for local recovery, should exist but doesn't + no segments file found in store Elasticsearch	2	4167	July 5, 2017
Shards remain UNASSIGNED after _restore operation Elasticsearch	2	5271	March 16, 2017

Behavior of a failed snapshot restoration

Related topics