Cluster Failure

We saw 100+ cpu usage and then upon reboot we say this:

[2011-11-14 15:00:45,785][WARN ][indices.cluster ] [Network]
[documents][10] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[documents][10] shard allocated for local recovery (post api), should
exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:
99)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2011-11-14 15:00:45,793][WARN ][indices.cluster ] [Network]
[documents][14] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[documents][14] shard allocated for local recovery (post api), should
exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:
99)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2011-11-14 15:00:45,794][WARN ][cluster.action.shard ] [Network]
sending failed shard for [documents][14],
node[UdFv305pR6KwkANc7GdjmQ], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[documents]
[14] shard allocated for local recovery (post api), should exists, but
doesn't]]]
[2011-11-14 15:00:45,794][WARN ][cluster.action.shard ] [Network]
received shard failed for [documents][14],
node[UdFv305pR6KwkANc7GdjmQ], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[documents]
[14] shard allocated for local recovery (post api), should exists, but
doesn't]]]
[2011-11-14 15:00:45,794][WARN ][cluster.action.shard ] [Network]
sending failed shard for [documents][10],
node[UdFv305pR6KwkANc7GdjmQ], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[documents]
[10] shard allocated for local recovery (post api), should exists, but
doesn't]]]
[2011-11-14 15:00:45,794][WARN ][cluster.action.shard ] [Network]
received shard failed for [documents][10],
node[UdFv305pR6KwkANc7GdjmQ], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[documents]
[10] shard allocated for local recovery (post api), should exists, but
doesn't]]]
[2011-11-14 15:00:45,795][WARN ][cluster.action.shard ] [Network]
sending failed shard for [documents][18],
node[UdFv305pR6KwkANc7GdjmQ], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[documents]
[18] failed recovery]; nested:
EngineCreationFailureException[[documents][18] Failed to open reader
on writer]; nested: FileNotFoundException[/var/lib/elasticsearch/
viralheat/nodes/0/indices/documents/18/index/_len.fnm (Too many open
files)]; ]]
[2011-11-14 15:00:45,795][WARN ][cluster.action.shard ] [Network]
received shard failed for [documents][18],
node[UdFv305pR6KwkANc7GdjmQ], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[documents]
[18] failed recovery]; nested:
EngineCreationFailureException[[documents][18] Failed to open reader
on writer]; nested: FileNotFoundException[/var/lib/elasticsearch/
viralheat/nodes/0/indices/documents/18/index/_len.fnm (Too many open
files)]; ]]
[2011-11-14 15:00:45,804][WARN ][indices.cluster ] [Network]
[documents][10] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[documents][10] shard allocated for local recovery (post api), should
exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:
99)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

Please report what you did with more details and the version you used.
Also your used environment would be nice (java version, OS, RAM
resources, index size, ...)

Peter.

On 15 Nov., 00:03, electic elec...@gmail.com wrote:

We saw 100+ cpu usage and then upon reboot we say this:

[2011-11-14 15:00:45,785][WARN ][indices.cluster ] [Network]
[documents][10] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[documents][10] shard allocated for local recovery (post api), should
exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalI ndexShardGateway.java:
99)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2011-11-14 15:00:45,793][WARN ][indices.cluster ] [Network]
[documents][14] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[documents][14] shard allocated for local recovery (post api), should
exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalI ndexShardGateway.java:
99)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2011-11-14 15:00:45,794][WARN ][cluster.action.shard ] [Network]
sending failed shard for [documents][14],
node[UdFv305pR6KwkANc7GdjmQ], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[documents]
[14] shard allocated for local recovery (post api), should exists, but
doesn't]]]
[2011-11-14 15:00:45,794][WARN ][cluster.action.shard ] [Network]
received shard failed for [documents][14],
node[UdFv305pR6KwkANc7GdjmQ], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[documents]
[14] shard allocated for local recovery (post api), should exists, but
doesn't]]]
[2011-11-14 15:00:45,794][WARN ][cluster.action.shard ] [Network]
sending failed shard for [documents][10],
node[UdFv305pR6KwkANc7GdjmQ], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[documents]
[10] shard allocated for local recovery (post api), should exists, but
doesn't]]]
[2011-11-14 15:00:45,794][WARN ][cluster.action.shard ] [Network]
received shard failed for [documents][10],
node[UdFv305pR6KwkANc7GdjmQ], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[documents]
[10] shard allocated for local recovery (post api), should exists, but
doesn't]]]
[2011-11-14 15:00:45,795][WARN ][cluster.action.shard ] [Network]
sending failed shard for [documents][18],
node[UdFv305pR6KwkANc7GdjmQ], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[documents]
[18] failed recovery]; nested:
EngineCreationFailureException[[documents][18] Failed to open reader
on writer]; nested: FileNotFoundException[/var/lib/elasticsearch/
viralheat/nodes/0/indices/documents/18/index/_len.fnm (Too many open
files)]; ]]
[2011-11-14 15:00:45,795][WARN ][cluster.action.shard ] [Network]
received shard failed for [documents][18],
node[UdFv305pR6KwkANc7GdjmQ], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[documents]
[18] failed recovery]; nested:
EngineCreationFailureException[[documents][18] Failed to open reader
on writer]; nested: FileNotFoundException[/var/lib/elasticsearch/
viralheat/nodes/0/indices/documents/18/index/_len.fnm (Too many open
files)]; ]]
[2011-11-14 15:00:45,804][WARN ][indices.cluster ] [Network]
[documents][10] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[documents][10] shard allocated for local recovery (post api), should
exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalI ndexShardGateway.java:
99)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)