Shards stuck in recovery for long periods

robert_3 · September 3, 2013, 11:45pm

Hi all,

Seeing some recent problems on master where shortly after a cluster reboot
and full recovery (all shards green, all nodes connected), on occasion a
shard or two will start being marked as recovery (rebalancing is disabled,
so it's not due to it being moved around). After a few minutes they
sometimes say they're initializing (an order of magnitude after the amount
of time it should take to transfer the shard given all the throttling
settings and network), but in the end they'll never come back up and just
remain in that state until I manually reroute the shard or restart the
server.

We were seeing similar problems on a larger scale on 0.90.3 (multiple
shards never recovering, all of them replicas of the same primary). I
thought I saw something about that being fixed on github, but can't recall
the exact issue now. Any chance these are related? Is there something I can
do to debug what is going on while the shards say initializing/recovering?
Nothing is currently coming up in my logs.

Thanks,

Robert Deaton

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

spinscale · September 12, 2013, 10:12am

Hey,

anything in the logs, cluster state, hot_threads output, which might add
some information to this?

I am worried about the 'on occasion' part. If everything is good and you do
not change the amount of nodes in your cluster, there should be no need to
mark a shard to recover, so something has happened at that stage. Anything
in the logs then?

--Alex

On Wed, Sep 4, 2013 at 1:45 AM, Robert Deaton robert@quizlet.com wrote:

Hi all,

Seeing some recent problems on master where shortly after a cluster reboot
and full recovery (all shards green, all nodes connected), on occasion a
shard or two will start being marked as recovery (rebalancing is disabled,
so it's not due to it being moved around). After a few minutes they
sometimes say they're initializing (an order of magnitude after the amount
of time it should take to transfer the shard given all the throttling
settings and network), but in the end they'll never come back up and just
remain in that state until I manually reroute the shard or restart the
server.

We were seeing similar problems on a larger scale on 0.90.3 (multiple
shards never recovering, all of them replicas of the same primary). I
thought I saw something about that being fixed on github, but can't recall
the exact issue now. Any chance these are related? Is there something I can
do to debug what is going on while the shards say initializing/recovering?
Nothing is currently coming up in my logs.

Thanks,

Robert Deaton

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Shard initialization stuck - RecoveryFailedException Elasticsearch	4	2938	March 5, 2019
Nodes restarting with shards initializing Elasticsearch	3	340	November 25, 2021
Replica shards stuck in Initialization phase Elasticsearch	4	3008	April 28, 2016
Shard Stuck in INITIALIZING and RELOCATING for more than 12 hours Elasticsearch	36	34311	December 28, 2018
Initializing shards always show 1 Elasticsearch	0	378	December 4, 2013

Shards stuck in recovery for long periods

Thanks,

Thanks,

Related topics