RepositoryMissingException on select shards during snapshot recovery from S3


(580farm) #1

Hello,

I made a successful backup that reported no errors in elasticsearch:

"state":"SUCCESS","start_time":"2014-12-06T00:12:39.362Z","start_time_in_millis":1417824759362,"end_time":"2014-12-06T00:33:34.352Z","end_time_in_millis":1417826014352,"duration_in_millis":1254990,"failures":[],"shards":{"total":345,"failed":0,"successful":345}}]}

I started the recovery process (on an empty elasticsearch instance) and it worked normally until there were 5 shards left, then in the logs each one gave the following exception:

[2014-12-08 00:00:01,689][WARN ][cluster.action.shard ] [Sunder] [logstash-2014.10.02][2] received shard failed for [logstash-2014.10.02][2], node[QO2smP95QCehd9RBanjqyw], [P], restoring[elasticsearch:snapshot_1], s[INITIALIZING], indexUUID [fN23tywuQXmwhg2mAMCB0A], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[logstash-2014.10.02][2] failed recovery]; nested: IndexShardRestoreFailedException[[logstash-2014.10.02][2] restore failed]; nested: RepositoryMissingException[[elasticsearch] missing]; ]]

I see the data in the s3 bucket when I go into that index, but I don't see any data listed in the shard data:

logstash-2014.10.02 3 p INITIALIZING 127.0.0.1 The Russian
logstash-2014.10.02 3 r UNASSIGNED
logstash-2014.10.02 2 p INITIALIZING 127.0.0.1 The Russian
logstash-2014.10.02 2 r UNASSIGNED
logstash-2014.10.01 2 p INITIALIZING 127.0.0.1 The Russian
logstash-2014.10.01 2 r UNASSIGNED
kibana-int 0 p INITIALIZING 127.0.0.1 The Russian
kibana-int 0 r UNASSIGNED
kibana-int 4 p INITIALIZING 127.0.0.1 Ghost Rider
kibana-int 4 r UNASSIGNED

what are my next best steps to recover the cluster at this point? I don't care about possible data loss for any type of forced recovery of the shards, but I'm confused as to why they'd be missing if they successfully were stored in s3.


(Matt) #2

@580farm - did you get anywhere with this? I've stumbled into this exact issue too (ES 1.6.2 though). The repo is created, and I've recovered all 13 indexes with the exception of one shard in one index that's reporting RepositoryMissingException.


(system) #3