All commands returning - failed to process cluster event (503)


(Ashish Nigam-3) #1

Hi,
Production cluster is in yellow state for more than a day now. I do see
that unassigned shards are decreasing, but at a very slow pace. (600 shards
in last 5 hours)
So, I decided to remove very old indexes and also allocate unassigned
shards using reroute command. Unfortunately, all commands are returning
with this error -

{

"error" : "RemoteTransportException[[search00][inet[/10.0.1.10:9300]][cluster/reroute]];
nested: ProcessClusterEventTimeoutException[failed to process cluster event
(cluster_reroute (api)) within 30s]; ",

"status" : 503
}

Why I am getting this error consistently? And any idea how to speed up
recovery and also be able to execute commands?

Another data point is that cluster went to yellow state as one node went
out of the cluster and then joined back after sometime.

Thanks
Ashish

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANsfGRmbw%3DAS3txAZA2O5PrK-oC52LysswDuyA2%3DgubydE74rA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ashish Nigam-3) #2

Here's sample error message that I see for unassigned shards -

[2014-06-23 04:59:08,087][WARN ][cluster.action.shard ] [search02]
sending failed shard for [an_shnindex_9871_2014_1][3],
node[bc03gt7WRx-XYToz7nHVqg], [P], s[STARTED], reason [master
[search00][PDXvmuL_TpG7bEIfIUvBDg][inet[/10.0.1.10:9300]]{master=true,
zone=zone_spop-sjc} marked shard as started, but shard has not been
created, mark shard as failed]

On Mon, Jun 23, 2014 at 4:01 PM, Ashish Nigam ashnigamtech@gmail.com
wrote:

Hi,
Production cluster is in yellow state for more than a day now. I do see
that unassigned shards are decreasing, but at a very slow pace. (600 shards
in last 5 hours)
So, I decided to remove very old indexes and also allocate unassigned
shards using reroute command. Unfortunately, all commands are returning
with this error -

{

"error" : "RemoteTransportException[[search00][inet[/10.0.1.10:9300]][cluster/reroute]];
nested: ProcessClusterEventTimeoutException[failed to process cluster event
(cluster_reroute (api)) within 30s]; ",

"status" : 503
}

Why I am getting this error consistently? And any idea how to speed up
recovery and also be able to execute commands?

Another data point is that cluster went to yellow state as one node went
out of the cluster and then joined back after sometime.

Thanks
Ashish

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANsfGRk817E2VzotntG3UeijGo%2BKMpC%3DyY1Guj9sTUf1Nkx3Fw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3