Recover after failure (lost shards)

Jerome_Gagnon · May 7, 2013, 8:33pm

Heya,

I just ran into a major node HDD failure and lost some shards on as index,
so this is bad, but not that bad since we are in the process or reindexing
everything. However, I would like to replace the lost shards with "dummy"
empty shards just to retrieve the green cluster state.
I've read somewhere (
http://elasticsearch-users.115913.n3.nabble.com/Recovering-after-shard-failure-td4018776.html)
that creating a similar index on a local cluster and moving the empty
missing shards to some of the cluster nodes and they will be recovered.

However, when I tried it, I got this error;

gist.github.com

https://gist.github.com/jgagnon1/a4ebc27b17e8451af59e

gistfile1.txt

[2013-05-07 13:54:23,133][WARN ][indices.cluster          ] [es8b] [index][229] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [index][229] shard allocated for local recovery (post api), should exists, but doesn't
        at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
        at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
[2013-05-07 13:54:23,135][WARN ][cluster.action.shard     ] [es8b] sending failed shard for [index][229], node[QIbhkxalRcafyM3_w_5CiQ], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index][229] shard allocated for local recovery (post api), should exists, but doesn't]]]

I was wondering, is this method still valid with 0.20.x versions ? And if
not, is there a way to do it ? I would be nice to have this... I've seen
some post about it, but the only solution I have found is the one I
mentionned.

Many thanks,

Jerome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · May 7, 2013, 8:44pm

Oh, forgot to mention; we don't have replicas on this index. (We are
currently adding them, while reindexing)

Don't do that by the way (no replicas), its a massive pain.

On Tuesday, May 7, 2013 4:33:57 PM UTC-4, Jérôme Gagnon wrote:

Heya,

I just ran into a major node HDD failure and lost some shards on as index,
so this is bad, but not that bad since we are in the process or reindexing
everything. However, I would like to replace the lost shards with "dummy"
empty shards just to retrieve the green cluster state.
I've read somewhere (
http://elasticsearch-users.115913.n3.nabble.com/Recovering-after-shard-failure-td4018776.html)
that creating a similar index on a local cluster and moving the empty
missing shards to some of the cluster nodes and they will be recovered.

However, when I tried it, I got this error;
gist:a4ebc27b17e8451af59e · GitHub

I was wondering, is this method still valid with 0.20.x versions ? And if
not, is there a way to do it ? I would be nice to have this... I've seen
some post about it, but the only solution I have found is the one I
mentionned.

Many thanks,

Jerome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · May 8, 2013, 1:37pm

Thanks to Shay, I finally got it working; Basically the solution was the
good one; but I made two mistakes;

Shard folder permissions; the shard folder must have the same
owner/permission as the other (user that run ES process)
Created a "dummy index" from another cluster which was running
Elasticsearch 0.90.x which was creating Lucene 4.x shard instead of 3.6

Shay told me that in 0.90.x the error message when ES is not able to load
some Lucene shard are much clearer, good to know!

But then again, we are running without replicas on one index due to several
constraint (IO, etc...), and this is not the way to go, we've been butt
hurt (and still are) many time due to that, and we are moving forward with
a better architecture.

On the end, I got the cluster green and learned more. and finally went to
sleep, everything's good

Jerome

On Tuesday, May 7, 2013 4:44:00 PM UTC-4, Jérôme Gagnon wrote:

Oh, forgot to mention; we don't have replicas on this index. (We are
currently adding them, while reindexing)

Don't do that by the way (no replicas), its a massive pain.

On Tuesday, May 7, 2013 4:33:57 PM UTC-4, Jérôme Gagnon wrote:

Heya,

I just ran into a major node HDD failure and lost some shards on as
index, so this is bad, but not that bad since we are in the process or
reindexing everything. However, I would like to replace the lost shards
with "dummy" empty shards just to retrieve the green cluster state.
I've read somewhere (
http://elasticsearch-users.115913.n3.nabble.com/Recovering-after-shard-failure-td4018776.html)
that creating a similar index on a local cluster and moving the empty
missing shards to some of the cluster nodes and they will be recovered.

However, when I tried it, I got this error;
gist:a4ebc27b17e8451af59e · GitHub

I was wondering, is this method still valid with 0.20.x versions ? And if
not, is there a way to do it ? I would be nice to have this... I've seen
some post about it, but the only solution I have found is the one I
mentionned.

Many thanks,

Jerome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Trouble restarting after crash Elasticsearch	4	719	July 6, 2017
Cluster Failure Elasticsearch	2	240	July 6, 2017
Failed to start shard Elasticsearch	1	235	July 6, 2017
Failed to start shard Elasticsearch	7	380	July 6, 2017
ES 0.20.5 stuck in RED after node loss OR how do I configured to avoid problems? Elasticsearch	4	376	July 6, 2017

Recover after failure (lost shards)

Related topics