ES 0.20.5 stuck in RED after node loss OR how do I configured to avoid problems?

Andy_Wick · March 11, 2013, 1:07pm

Recently upgraded from 0.19.12 to 0.20.5. A node went down over the
weekend in my 30 node cluster. I use ES as a DB so there were constant
writes while the node was down. Restarted the node and went RED.

[2013-03-10 15:28:22,209][WARN ][indices.cluster ] [moloches-m13b]
[stats][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[stats][0] shard allocated for local recovery (post api), should exists,
but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)

What is the correct way to recover?

I have full replication on for this index, and it is a tiny index (8
documents for total store size of 10k)

{
"stats" : {
"settings" : {
"index.number_of_replicas" : "29",
"index.auto_expand_replicas" : "0-all",
"index.number_of_shards" : "1",
"index.version.created" : "200599"
}
}
}

My desire is that ES should just take care of this, I have 29 other copies
why do I need to do anything? In a dream world ES would delete the
broken index and copy from somewhere else. What am I missing?

Thanks,
Andy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · March 11, 2013, 3:14pm

Hiya

On Mon, 2013-03-11 at 06:07 -0700, Andy Wick wrote:

Recently upgraded from 0.19.12 to 0.20.5. A node went down over the
weekend in my 30 node cluster. I use ES as a DB so there were
constant writes while the node was down. Restarted the node and went
RED.

[2013-03-10 15:28:22,209][WARN ][indices.cluster ]
[moloches-m13b] [stats][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[stats][0] shard allocated for local recovery (post api), should
exists, but doesn't

Hmm, I wonder if there is a problem with the data store on that node?

Perhaps just delete the datastore for that index on that node, and
restart.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy_Wick · March 11, 2013, 3:30pm

Stopped the node, deleted the directory, restarted the node and
then immediately two other nodes started having the same issue with the
same index. Eventually I just gave up and deleted the index (I should say
I deleted all the directories because -XDELETE on the index would just
hang, I guess because it was in RED.) Should this work, or am I missing
the point of replication? To me it seems like if one node is bad ES should
just clean up and copy over a good version automatically.

Thanks,
Andy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · March 11, 2013, 3:57pm

On Mon, 2013-03-11 at 08:30 -0700, Andy Wick wrote:

Stopped the node, deleted the directory, restarted the node and then
immediately two other nodes started having the same issue with the
same index. Eventually I just gave up and deleted the index (I should
say I deleted all the directories because -XDELETE on the index would
just hang, I guess because it was in RED.) Should this work, or am I
missing the point of replication? To me it seems like if one node is
bad ES should just clean up and copy over a good version
automatically.

Yes it should work.

Please can you open an issue giving a full description, plus the full
logs?

ta

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
ES Cluster Recovery and Restart Elasticsearch	3	622	July 6, 2017
0.19.10 - cluster wedged, most operations failing Elasticsearch	4	479	July 6, 2017
ES Ate My Shards/Indexes Elasticsearch	13	573	July 6, 2017
Recovery from red ES node and red indices Elasticsearch	4	588	July 6, 2017
Index recovery failure on node restart since v1.3.x Elasticsearch	5	414	July 6, 2017

ES 0.20.5 stuck in RED after node loss OR how do I configured to avoid problems?

Related topics