Failed to start shard NumberFormatException


(John Watson) #1

Couple nodes ran out space on their disks that stored the ES shards. After
shutting down the nodes, clearing some space and starting them back up, the
log keeps filling with:

[2012-05-24 07:40:38,952][WARN ][cluster.action.shard ] [Firearm]
received shard failed for [threads][4], node[5sT1SgOxR16I92QKymk9yw], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[threads][4] failed recovery]; nested:
NumberFormatException[For input string: "3hr.1337807144597"]; ]]

The [4] (which I assume is the shard number) changes for all shards 0-5 and
so does the input string.

ES 0.19.4
Java 1.6.0_31
3 nodes - 1 index (6 shards - 1 replica)


(Shay Banon) #2

There should be another log message on the other node with more details,
can you gist it?

On Thu, May 24, 2012 at 9:58 AM, John Watson john@disqus.com wrote:

Couple nodes ran out space on their disks that stored the ES shards. After
shutting down the nodes, clearing some space and starting them back up, the
log keeps filling with:

[2012-05-24 07:40:38,952][WARN ][cluster.action.shard ] [Firearm]
received shard failed for [threads][4], node[5sT1SgOxR16I92QKymk9yw], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[threads][4] failed recovery]; nested:
NumberFormatException[For input string: "3hr.1337807144597"]; ]]

The [4] (which I assume is the shard number) changes for all shards 0-5
and so does the input string.

ES 0.19.4
Java 1.6.0_31
3 nodes - 1 index (6 shards - 1 replica)


(John Watson) #3

Here you go: https://gist.github.com/9393ee7e0f3a70952e3c

I was able to recover my cluster by moving the _state/state- file to
the replica shards since the cluster seem to always wait for the primary
shards to attempt recovery.

On Friday, May 25, 2012 3:32:08 PM UTC-7, kimchy wrote:

There should be another log message on the other node with more details,
can you gist it?

On Thu, May 24, 2012 at 9:58 AM, John Watson john@disqus.com wrote:

Couple nodes ran out space on their disks that stored the ES shards.
After shutting down the nodes, clearing some space and starting them back
up, the log keeps filling with:

[2012-05-24 07:40:38,952][WARN ][cluster.action.shard ] [Firearm]
received shard failed for [threads][4], node[5sT1SgOxR16I92QKymk9yw], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[threads][4] failed recovery]; nested:
NumberFormatException[For input string: "3hr.1337807144597"]; ]]

The [4] (which I assume is the shard number) changes for all shards 0-5
and so does the input string.

ES 0.19.4
Java 1.6.0_31
3 nodes - 1 index (6 shards - 1 replica)


(system) #4