Trouble restarting after crash

Chuck_McKenzie · May 7, 2012, 6:22pm

I've inherited a several large 18.7 elasticsearch clusters and I'm
having some trouble getting one to restart after a crash. (We ran out
of open filehandles.) I've since upped the limit, and I'll be
cleaning up old indices after it comes back up, so that shouldn't
happen again, but I can't get the cluster to finish starting.

Here's the problem I'm seeing:

[2012-05-07 13:11:04,926][WARN ][indices.cluster ]
[node_name] [shard_name][8] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[shard_name][8] shard allocated for local recovery (post api), should
exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:
99)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

I've followed earlier instructions on this mailing list that say to
XDELETE the affected index, but that doesn't seem to be working - it's
been sitting for an hour as follows:

{

cluster_name: es_cluster1
status: red
timed_out: false
number_of_nodes: 4
number_of_data_nodes: 4
active_primary_shards: 5260
active_shards: 9719
relocating_shards: 0
initializing_shards: 6
unassigned_shards: 41

}

Any idea how I can get rid of the two tiny test indices that are
having problems, without deleting several TB of data from the other
indices?

Rafal_Kuc_3 · May 7, 2012, 6:28pm

Hello!

We had similar issue to yours - did you try running XDELETE on more
then one nodes ? We had to run XDELETE on two nodes in the cluster to
actually have problematic indices deleted.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

I've inherited a several large 18.7 elasticsearch clusters and I'm
having some trouble getting one to restart after a crash. (We ran out
of open filehandles.) I've since upped the limit, and I'll be
cleaning up old indices after it comes back up, so that shouldn't
happen again, but I can't get the cluster to finish starting.

Here's the problem I'm seeing:

[2012-05-07 13:11:04,926][WARN ][indices.cluster ]
[node_name] [shard_name][8] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[shard_name][8] shard allocated for local recovery (post api), should
exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:
99)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

I've followed earlier instructions on this mailing list that say to
XDELETE the affected index, but that doesn't seem to be working - it's
been sitting for an hour as follows:

{

cluster_name: es_cluster1
status: red
timed_out: false
number_of_nodes: 4
number_of_data_nodes: 4
active_primary_shards: 5260
active_shards: 9719
relocating_shards: 0
initializing_shards: 6
unassigned_shards: 41

}

Any idea how I can get rid of the two tiny test indices that are
having problems, without deleting several TB of data from the other
indices?

Chuck_McKenzie · May 7, 2012, 6:39pm

They're running against localhost on each of the 4 nodes. Doesn't
seem to help.

On May 7, 1:28 pm, Rafał Kuć r....@solr.pl wrote:

Hello!

We had similar issue to yours - did you try running XDELETE on more
then one nodes ? We had to run XDELETE on two nodes in the cluster to
actually have problematic indices deleted.

--
Regards,
Rafa³ Kuæ
Sematext ::http://sematext.com/:: Solr - Lucene - Nutch - Elasticsearch

I've inherited a several large 18.7 elasticsearch clusters and I'm
having some trouble getting one to restart after a crash. (We ran out
of open filehandles.) I've since upped the limit, and I'll be
cleaning up old indices after it comes back up, so that shouldn't
happen again, but I can't get the cluster to finish starting.
Here's the problem I'm seeing:
[2012-05-07 13:11:04,926][WARN ][indices.cluster ]
[node_name] [shard_name][8] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[shard_name][8] shard allocated for local recovery (post api), should
exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:
99)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
I've followed earlier instructions on this mailing list that say to
XDELETE the affected index, but that doesn't seem to be working - it's
been sitting for an hour as follows:
{
cluster_name: es_cluster1
status: red
timed_out: false
number_of_nodes: 4
number_of_data_nodes: 4
active_primary_shards: 5260
active_shards: 9719
relocating_shards: 0
initializing_shards: 6
unassigned_shards: 41
}
Any idea how I can get rid of the two tiny test indices that are
having problems, without deleting several TB of data from the other
indices?

kimchy · May 9, 2012, 9:08am

DELETE the index will help to remove this message, this problem should be
fixed in 0.19 with the new local gateway structure and several bug fixes
(the fact that a shard can't recover).

On Mon, May 7, 2012 at 9:39 PM, Chuck McKenzie redchuck@gmail.com wrote:

They're running against localhost on each of the 4 nodes. Doesn't
seem to help.

On May 7, 1:28 pm, Rafał Kuć r....@solr.pl wrote:

Hello!

We had similar issue to yours - did you try running XDELETE on more
then one nodes ? We had to run XDELETE on two nodes in the cluster to
actually have problematic indices deleted.

--
Regards,
Rafa³ Kuæ
Sematext ::http://sematext.com/:: Solr - Lucene - Nutch - Elasticsearch

I've inherited a several large 18.7 elasticsearch clusters and I'm
having some trouble getting one to restart after a crash. (We ran out
of open filehandles.) I've since upped the limit, and I'll be
cleaning up old indices after it comes back up, so that shouldn't
happen again, but I can't get the cluster to finish starting.
Here's the problem I'm seeing:
[2012-05-07 13:11:04,926][WARN ][indices.cluster ]
[node_name] [shard_name][8] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[shard_name][8] shard allocated for local recovery (post api), should
exists, but doesn't
at

org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
I've followed earlier instructions on this mailing list that say to
XDELETE the affected index, but that doesn't seem to be working - it's
been sitting for an hour as follows:
{
cluster_name: es_cluster1
status: red
timed_out: false
number_of_nodes: 4
number_of_data_nodes: 4
active_primary_shards: 5260
active_shards: 9719
relocating_shards: 0
initializing_shards: 6
unassigned_shards: 41
}
Any idea how I can get rid of the two tiny test indices that are
having problems, without deleting several TB of data from the other
indices?

Topic		Replies	Views
Index corruption on cluster restart Elasticsearch	3	1315	July 6, 2017
Failed to start shard Elasticsearch	7	380	July 6, 2017
Failed to start shard Elasticsearch	1	235	July 6, 2017
Cluster Failure Elasticsearch	2	240	July 6, 2017
ES Cluster Recovery and Restart Elasticsearch	3	586	July 6, 2017

Trouble restarting after crash

Related topics