Hello,
I have two ES clusters X & Y.
One of this cluster (Y) is "read only", and I periodically synchronize this cluster by restoring a snapshot made in the first cluster (X). The restoration is done through a HTTP remote repository. Each indices have one shard and one replica
Sometimes, when a network failure occurs during the restoration, the cluster Y turn red because of unassigned shards. closing/reopening the failed indices doesn't work and the cluster is dead until I wait for the http repository to be up again and start a new restoration task.
the allocation-explain of an unassigned shard show a corrupted index (see below).
Is that the expected behavior of such failure ? Is there a way to keep the readonly cluster working with the last-seen-data until the connection to the remote repository is up again ?
thanks !
Jérémie
---
explanation:
index: "event-2020.05.15"
shard: 0
primary: true
current_state: "unassigned"
unassigned_info:
reason: "INDEX_REOPENED"
at: "2020-05-15T07:27:49.266Z"
last_allocation_status: "no_valid_shard_copy"
can_allocate: "no_valid_shard_copy"
allocate_explanation: "cannot allocate because all found copies of the shard are\
\ either stale or corrupt"
node_allocation_decisions:
- node_id: "5fIIsDsEQaa0C--jf0smzA"
node_name: "stretch64-vm9"
transport_address: "172.16.20.19:9300"
node_decision: "no"
store:
in_sync: true
allocation_id: "qyu9zRslRgiRY1SZuf9HpA"
store_exception:
type: "corrupt_index_exception"
reason: "Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(SimpleFSIndexInput(path=\"\
/var/elasticsearch/idx/nodes/0/indices/2wMoXs6-SyatKdgma6f1BA/0/index/segments_4\"\
)))"
caused_by:
type: "no_such_file_exception"
reason: "/var/elasticsearch/idx/nodes/0/indices/2wMoXs6-SyatKdgma6f1BA/0/index/_4.si"
- node_id: "jL1We097QOWuAg6bhl5tvw"
node_name: "stretch64-vm10"
transport_address: "172.16.20.20:9300"
node_decision: "no"
store:
in_sync: false
allocation_id: "zqsc5Lo0Q0iT66SpAxEVEw"
- node_id: "rt1Ga1ZeQJ2SuLqAvTbWpw"
node_name: "stretch64-vm8"
transport_address: "172.16.20.18:9300"
node_decision: "no"
store:
found: false