Apparent infinite loop in node recovery


(Grant Rodgers) #1

We had an incident yesterday where a node ran out of memory while
indexing and the elasticsearch process was killed. Now i'm trying to
recover today and noticing what seems to be an infinite loop in the
recovery process log.

Here's what happens in the log after a restart (filtered to only show
one shard):

[22:59:07,944][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,046][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,047][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,156][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,156][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,266][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,266][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,374][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,374][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,476][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,476][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,586][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,586][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,696][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,696][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,806][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,806][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,916][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,916][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,026][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,026][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,136][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,136][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,246][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,247][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,356][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]

and many more lines like that, for each shard. ~30000 debug lines from
index.shard.service in less than 3 minutes (before i stopped the
process).

Is this normal? is it a corruption error? it seems to happen
regardless of whether there is a work directory or not.


(Shay Banon) #2

Thats does not indicate an infinte loop. The recovery process (in this case,
recovery of a replica shard from the primary shard) is throttled on the node
level. Basically, in order to not tax the machine the node is running on,
recoveries are throttled (both on the number of concurrent recoveries
allowed, and on the number of concurrent transfers allowed).

A recovery started my moving the shard to RECOVERING state, and then checks
if it is allowed to perform a recovery. If not, either because the it is
being throttled on the current node, or the node it recovers from, it
restores the previous non recovering state, and retries after an interval
(by default, 100ms).

That process can create a lot of DEBUG level logs (I need to rethink the
logging aspect here), but should finish successfully. It should not indicate
a problem.

-shay.banon

On Thu, Aug 12, 2010 at 2:10 AM, Grant Rodgers grantr@gmail.com wrote:

We had an incident yesterday where a node ran out of memory while
indexing and the elasticsearch process was killed. Now i'm trying to
recover today and noticing what seems to be an infinite loop in the
recovery process log.

Here's what happens in the log after a restart (filtered to only show
one shard):

[22:59:07,944][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,046][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,047][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,156][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,156][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,266][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,266][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,374][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,374][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,476][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,476][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,586][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,586][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,696][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,696][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,806][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,806][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,916][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,916][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,026][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,026][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,136][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,136][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,246][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,247][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,356][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]

and many more lines like that, for each shard. ~30000 debug lines from
index.shard.service in less than 3 minutes (before i stopped the
process).

Is this normal? is it a corruption error? it seems to happen
regardless of whether there is a work directory or not.


(Grant Rodgers) #3

Ok, good to know. I will let it run and see if it finishes
successfully. thanks!

On Aug 11, 4:19 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Thats does not indicate an infinte loop. The recovery process (in this case,
recovery of a replica shard from the primary shard) is throttled on the node
level. Basically, in order to not tax the machine the node is running on,
recoveries are throttled (both on the number of concurrent recoveries
allowed, and on the number of concurrent transfers allowed).

A recovery started my moving the shard to RECOVERING state, and then checks
if it is allowed to perform a recovery. If not, either because the it is
being throttled on the current node, or the node it recovers from, it
restores the previous non recovering state, and retries after an interval
(by default, 100ms).

That process can create a lot of DEBUG level logs (I need to rethink the
logging aspect here), but should finish successfully. It should not indicate
a problem.

-shay.banon

On Thu, Aug 12, 2010 at 2:10 AM, Grant Rodgers gra...@gmail.com wrote:

We had an incident yesterday where a node ran out of memory while
indexing and the elasticsearch process was killed. Now i'm trying to
recover today and noticing what seems to be an infinite loop in the
recovery process log.

Here's what happens in the log after a restart (filtered to only show
one shard):

[22:59:07,944][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,046][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,047][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,156][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,156][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,266][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,266][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,374][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,374][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,476][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,476][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,586][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,586][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,696][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,696][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,806][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,806][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,916][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,916][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,026][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,026][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,136][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,136][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,246][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,247][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,356][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]

and many more lines like that, for each shard. ~30000 debug lines from
index.shard.service in less than 3 minutes (before i stopped the
process).

Is this normal? is it a corruption error? it seems to happen
regardless of whether there is a work directory or not.


(Shay Banon) #4

I've pushed a fix for the excessive logging problem.

-shay.banon

On Thu, Aug 12, 2010 at 2:24 AM, Grant Rodgers grantr@gmail.com wrote:

Ok, good to know. I will let it run and see if it finishes
successfully. thanks!

On Aug 11, 4:19 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Thats does not indicate an infinte loop. The recovery process (in this
case,
recovery of a replica shard from the primary shard) is throttled on the
node
level. Basically, in order to not tax the machine the node is running on,
recoveries are throttled (both on the number of concurrent recoveries
allowed, and on the number of concurrent transfers allowed).

A recovery started my moving the shard to RECOVERING state, and then
checks
if it is allowed to perform a recovery. If not, either because the it is
being throttled on the current node, or the node it recovers from, it
restores the previous non recovering state, and retries after an interval
(by default, 100ms).

That process can create a lot of DEBUG level logs (I need to rethink the
logging aspect here), but should finish successfully. It should not
indicate
a problem.

-shay.banon

On Thu, Aug 12, 2010 at 2:10 AM, Grant Rodgers gra...@gmail.com wrote:

We had an incident yesterday where a node ran out of memory while
indexing and the elasticsearch process was killed. Now i'm trying to
recover today and noticing what seems to be an infinite loop in the
recovery process log.

Here's what happens in the log after a restart (filtered to only show
one shard):

[22:59:07,944][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,046][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,047][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,156][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,156][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,266][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,266][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,374][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,374][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,476][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,476][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,586][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,586][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,696][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,696][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,806][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,806][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,916][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,916][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,026][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,026][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,136][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,136][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,246][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,247][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,356][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]

and many more lines like that, for each shard. ~30000 debug lines from
index.shard.service in less than 3 minutes (before i stopped the
process).

Is this normal? is it a corruption error? it seems to happen
regardless of whether there is a work directory or not.


(Grant Rodgers) #5

Fyi, 19 hours later, there is no change in the state of the cluster,
and the logs are still bouncing between created and recovering. We
will delete and recreate the missing indices, but it seems like
recovery should have happened by now. What is the expected recovery
time for a shard with a few hundred thousand documents?

Here is the cluster state for reference. The dacbe3f4-ef13-43ba-aa13-
b6457780f145 node had been restarted a few times, and maybe that
caused some shards to get stuck in the relocating state? Still, I'd
expect that at least one shard would have recovered by now.

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
} ],
"1" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
} ],
"2" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
} ],
"3" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
} ],
"4" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
} ],
"5" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
} ]
}
},
"revisions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-
b6457780f145",
"shard" : 2,
"index" : "revisions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 3,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-
b6457780f145",
"shard" : 3,
"index" : "revisions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 4,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-
b6457780f145",
"shard" : 4,
"index" : "revisions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 5,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-
b6457780f145",
"shard" : 5,
"index" : "revisions"
} ]
}
}
}
},
"routing_nodes" : {
"unassigned" : [ {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
} ],
"nodes" : {
"b411c443-ebdd-416a-ac35-cce5af38fa3f" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 3,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 4,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 5,
"index" : "revisions"
} ],
"a4eada23-ab66-4ab5-bf20-8de09e25a6a3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"shard" : 2,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"shard" : 3,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"shard" : 4,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"shard" : 5,
"index" : "revisions"
} ],
"dacbe3f4-ef13-43ba-aa13-b6457780f145" : [ {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"shard" : 2,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"shard" : 3,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"shard" : 4,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"shard" : 5,
"index" : "revisions"
} ]
}

On Aug 12, 3:14 am, Shay Banon shay.ba...@elasticsearch.com wrote:

I've pushed a fix for the excessive logging problem.

-shay.banon

On Thu, Aug 12, 2010 at 2:24 AM, Grant Rodgers gra...@gmail.com wrote:

Ok, good to know. I will let it run and see if it finishes
successfully. thanks!

On Aug 11, 4:19 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Thats does not indicate an infinte loop. The recovery process (in this
case,
recovery of a replica shard from the primary shard) is throttled on the
node
level. Basically, in order to not tax the machine the node is running on,
recoveries are throttled (both on the number of concurrent recoveries
allowed, and on the number of concurrent transfers allowed).

A recovery started my moving the shard to RECOVERING state, and then
checks
if it is allowed to perform a recovery. If not, either because the it is
being throttled on the current node, or the node it recovers from, it
restores the previous non recovering state, and retries after an interval
(by default, 100ms).

That process can create a lot of DEBUG level logs (I need to rethink the
logging aspect here), but should finish successfully. It should not
indicate
a problem.

-shay.banon

On Thu, Aug 12, 2010 at 2:10 AM, Grant Rodgers gra...@gmail.com wrote:

We had an incident yesterday where a node ran out of memory while
indexing and the elasticsearch process was killed. Now i'm trying to
recover today and noticing what seems to be an infinite loop in the
recovery process log.

Here's what happens in the log after a restart (filtered to only show
one shard):

[22:59:07,944][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,046][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,047][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,156][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,156][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,266][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,266][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,374][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,374][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,476][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,476][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,586][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,586][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,696][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,696][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,806][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,806][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,916][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:08,916][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,026][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,026][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,136][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,136][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,246][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]
[22:59:09,247][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,356][DEBUG][index.shard.service ] [Joystick] [stats][4]
state: [CREATED]->[RECOVERING]

and many more lines like that, for each shard. ~30000 debug lines from
index.shard.service in less than 3 minutes (before i stopped the
process).

Is this normal? is it a corruption error? it seems to happen
regardless of whether there is a work directory or not.


(Shay Banon) #6

If I remember correctly, you use the s3 gateway, right? It seems like some
primary shards are stuck initializing (basically, recovering from the
gateway). I forgot to ask, which version are you using?

-shay.banon

On Thu, Aug 12, 2010 at 9:01 PM, Grant Rodgers grantr@gmail.com wrote:

Fyi, 19 hours later, there is no change in the state of the cluster,
and the logs are still bouncing between created and recovering. We
will delete and recreate the missing indices, but it seems like
recovery should have happened by now. What is the expected recovery
time for a shard with a few hundred thousand documents?

Here is the cluster state for reference. The dacbe3f4-ef13-43ba-aa13-
b6457780f145 node had been restarted a few times, and maybe that
caused some shards to get stuck in the relocating state? Still, I'd
expect that at least one shard would have recovered by now.

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
} ],
"1" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
} ],
"2" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
} ],
"3" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
} ],
"4" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
} ],
"5" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
} ]
}
},
"revisions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-
b6457780f145",
"shard" : 2,
"index" : "revisions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 3,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-
b6457780f145",
"shard" : 3,
"index" : "revisions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 4,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-
b6457780f145",
"shard" : 4,
"index" : "revisions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 5,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-
b6457780f145",
"shard" : 5,
"index" : "revisions"
} ]
}
}
}
},
"routing_nodes" : {
"unassigned" : [ {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
} ],
"nodes" : {
"b411c443-ebdd-416a-ac35-cce5af38fa3f" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 3,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 4,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 5,
"index" : "revisions"
} ],
"a4eada23-ab66-4ab5-bf20-8de09e25a6a3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"shard" : 2,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"shard" : 3,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"shard" : 4,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"shard" : 5,
"index" : "revisions"
} ],
"dacbe3f4-ef13-43ba-aa13-b6457780f145" : [ {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"shard" : 2,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"shard" : 3,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"shard" : 4,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"shard" : 5,
"index" : "revisions"
} ]
}

On Aug 12, 3:14 am, Shay Banon shay.ba...@elasticsearch.com wrote:

I've pushed a fix for the excessive logging problem.

-shay.banon

On Thu, Aug 12, 2010 at 2:24 AM, Grant Rodgers gra...@gmail.com wrote:

Ok, good to know. I will let it run and see if it finishes
successfully. thanks!

On Aug 11, 4:19 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Thats does not indicate an infinte loop. The recovery process (in
this

case,

recovery of a replica shard from the primary shard) is throttled on
the

node

level. Basically, in order to not tax the machine the node is running
on,

recoveries are throttled (both on the number of concurrent recoveries
allowed, and on the number of concurrent transfers allowed).

A recovery started my moving the shard to RECOVERING state, and then
checks
if it is allowed to perform a recovery. If not, either because the it
is

being throttled on the current node, or the node it recovers from, it
restores the previous non recovering state, and retries after an
interval

(by default, 100ms).

That process can create a lot of DEBUG level logs (I need to rethink
the

logging aspect here), but should finish successfully. It should not
indicate
a problem.

-shay.banon

On Thu, Aug 12, 2010 at 2:10 AM, Grant Rodgers gra...@gmail.com
wrote:

We had an incident yesterday where a node ran out of memory while
indexing and the elasticsearch process was killed. Now i'm trying
to

recover today and noticing what seems to be an infinite loop in the
recovery process log.

Here's what happens in the log after a restart (filtered to only
show

one shard):

[22:59:07,944][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,046][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [CREATED]->[RECOVERING]
[22:59:08,047][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,156][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [CREATED]->[RECOVERING]
[22:59:08,156][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,266][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [CREATED]->[RECOVERING]
[22:59:08,266][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,374][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [CREATED]->[RECOVERING]
[22:59:08,374][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,476][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [CREATED]->[RECOVERING]
[22:59:08,476][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,586][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [CREATED]->[RECOVERING]
[22:59:08,586][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,696][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [CREATED]->[RECOVERING]
[22:59:08,696][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,806][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [CREATED]->[RECOVERING]
[22:59:08,806][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [RECOVERING]->[CREATED], restored after recovery
[22:59:08,916][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [CREATED]->[RECOVERING]
[22:59:08,916][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,026][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [CREATED]->[RECOVERING]
[22:59:09,026][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,136][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [CREATED]->[RECOVERING]
[22:59:09,136][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,246][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [CREATED]->[RECOVERING]
[22:59:09,247][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [RECOVERING]->[CREATED], restored after recovery
[22:59:09,356][DEBUG][index.shard.service ] [Joystick]
[stats][4]

state: [CREATED]->[RECOVERING]

and many more lines like that, for each shard. ~30000 debug lines
from

index.shard.service in less than 3 minutes (before i stopped the
process).

Is this normal? is it a corruption error? it seems to happen
regardless of whether there is a work directory or not.


(Grant Rodgers) #7

Yes, we are using the s3 gateway and version 0.9.0.

Would we get better gateway performance by using the fs gateway and a
network file system, like gluster, drbd, or ceph?

On Aug 12, 11:34 am, Shay Banon shay.ba...@elasticsearch.com wrote:

If I remember correctly, you use the s3 gateway, right? It seems like some
primary shards are stuck initializing (basically, recovering from the
gateway). I forgot to ask, which version are you using?

-shay.banon

On Thu, Aug 12, 2010 at 9:01 PM, Grant Rodgers gra...@gmail.com wrote:

Fyi, 19 hours later, there is no change in the state of the cluster,
and the logs are still bouncing between created and recovering. We
will delete and recreate the missing indices, but it seems like
recovery should have happened by now. What is the expected recovery
time for a shard with a few hundred thousand documents?

Here is the cluster state for reference. The dacbe3f4-ef13-43ba-aa13-
b6457780f145 node had been restarted a few times, and maybe that
caused some shards to get stuck in the relocating state? Still, I'd
expect that at least one shard would have recovered by now.

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
} ],
"1" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
} ],
"2" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
} ],
"3" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
} ],
"4" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
} ],
"5" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
} ]
}
},
"revisions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-
b6457780f145",
"shard" : 2,
"index" : "revisions"
} ],
"3" : [ {

...

read more »


(Shay Banon) #8

It depends on the actual implementation. In general, the time it takes to do
recovery is how fast we can get data from s3 when a full recovery from it is
required (elasticsearch tries to minimize that by reusing the work dir,
though you should set the recover_after_nodes setting).

My concern is that maybe the recovery process is stuck somewhere. Not sure
where though... . Is there a chance for a thread dump on one of the nodes
that have a primary shard in INITIALIZING state? You can do that with
visualvm.

-shay.banon

On Thu, Aug 12, 2010 at 9:50 PM, Grant Rodgers grantr@gmail.com wrote:

Yes, we are using the s3 gateway and version 0.9.0.

Would we get better gateway performance by using the fs gateway and a
network file system, like gluster, drbd, or ceph?

On Aug 12, 11:34 am, Shay Banon shay.ba...@elasticsearch.com wrote:

If I remember correctly, you use the s3 gateway, right? It seems like
some
primary shards are stuck initializing (basically, recovering from the
gateway). I forgot to ask, which version are you using?

-shay.banon

On Thu, Aug 12, 2010 at 9:01 PM, Grant Rodgers gra...@gmail.com wrote:

Fyi, 19 hours later, there is no change in the state of the cluster,
and the logs are still bouncing between created and recovering. We
will delete and recreate the missing indices, but it seems like
recovery should have happened by now. What is the expected recovery
time for a shard with a few hundred thousand documents?

Here is the cluster state for reference. The dacbe3f4-ef13-43ba-aa13-
b6457780f145 node had been restarted a few times, and maybe that
caused some shards to get stuck in the relocating state? Still, I'd
expect that at least one shard would have recovered by now.

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
} ],
"1" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
} ],
"2" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
} ],
"3" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
} ],
"4" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
} ],
"5" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
} ]
}
},
"revisions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
}, {
"state" : "RELOCATING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : "dacbe3f4-ef13-43ba-aa13-
b6457780f145",
"shard" : 2,
"index" : "revisions"
} ],
"3" : [ {

...

read more »


(Grant Rodgers) #9

Now I've done a full cluster restart, deleting the work directories of
the corrupted indices so they would restore only from gateway again.
It seems to be working, but there are a few issues:

Some nodes appear to join the cluster, but sending a cluster health
request to it times out or returns MasterDisconnectedException (or
something like it, I don't have the exact exception name anymore). The
master definitely knows about these nodes though, because when they
stop it logs that they disconnected. Later, after the cluster has been
running for 10-15 minutes, the other nodes are able to connect and
respond to requests. In this case, I stopped them for a while and
restarted them later. I don't know if they would have become
operational if they'd stayed up.

It appears that local shards are not recovered before gateway shards.
We have one index with all shards present and intact in the work
directory, but those shards have not recovered yet after about 30
minutes, and other gateway shards are being recovered. Are shards
recovered sequentially? Do local shards get priority?

Thanks,
Grant

On Aug 12, 11:50 am, Grant Rodgers gra...@gmail.com wrote:

Yes, we are using the s3 gateway and version 0.9.0.

Would we get better gateway performance by using the fs gateway and a
network file system, like gluster, drbd, or ceph?

On Aug 12, 11:34 am, Shay Banon shay.ba...@elasticsearch.com wrote:

If I remember correctly, you use the s3 gateway, right? It seems like some
primary shards are stuck initializing (basically, recovering from the
gateway). I forgot to ask, which version are you using?

-shay.banon

On Thu, Aug 12, 2010 at 9:01 PM, Grant Rodgers gra...@gmail.com wrote:

Fyi, 19 hours later, there is no change in the state of the cluster,
and the logs are still bouncing between created and recovering. We
will delete and recreate the missing indices, but it seems like
recovery should have happened by now. What is the expected recovery
time for a shard with a few hundred thousand documents?

Here is the cluster state for reference. The dacbe3f4-ef13-43ba-aa13-
b6457780f145 node had been restarted a few times, and maybe that
caused some shards to get stuck in the relocating state? Still, I'd
expect that at least one shard would have recovered by now.

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
} ],
"1" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
} ],
"2" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
} ],
"3" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
} ],
"4" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
} ],
"5" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,

...

read more »


(Grant Rodgers) #10

Unfortunately the nodes mentioned originally have been stopped. But it
seems the new cluster is also having issues recovering. The work
directories have not changed in size for about 10 minutes.

Here is the routing table and nodes (thread dumps follow)

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
} ]
}
},
"revisions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 3,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "revisions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 4,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "revisions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 5,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "revisions"
} ]
}
}
}
},
"routing_nodes" : {
"unassigned" : [ {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "revisions"
} ],
"nodes" : {
"d9bd8b92-90d5-4892-bea8-46896b467974" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 5,
"index" : "revisions"
} ],
"eeb2af20-13f7-41e3-9dca-30b768785b4f" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 3,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 4,
"index" : "revisions"
} ]
}

Here is a thread dump from node eeb2af20-13f7-41e3-9dca-30b768785b4f

2010-08-12_19:35:09.31077 java.lang.Thread.State: TIMED_WAITING
(parking)
2010-08-12_19:35:09.31079 at sun.misc.Unsafe.park(Native Method)
2010-08-12_19:35:09.31080 - parking to wait for
<0x000000075bf09e48> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2010-08-12_19:35:09.31080 at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
2010-08-12_19:35:09.31081 at
java.util.concurrent.locks.AbstractQueuedSynchronizer
$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
2010-08-12_19:35:09.31081 at
java.util.concurrent.DelayQueue.take(DelayQueue.java:193)
2010-08-12_19:35:09.31082 at
java.util.concurrent.ScheduledThreadPoolExecutor
$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688)
2010-08-12_19:35:09.31083 at
java.util.concurrent.ScheduledThreadPoolExecutor
$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681)
2010-08-12_19:35:09.31084 at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:
1043)
2010-08-12_19:35:09.31084 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1103)
2010-08-12_19:35:09.31085 at
java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
2010-08-12_19:35:09.31085 at java.lang.Thread.run(Thread.java:
636)
2010-08-12_19:35:09.31086
2010-08-12_19:35:09.31086 "elasticsearch[Thumb, Tom][sc]-pool-2-
thread-2" daemon prio=10 tid=0x0000000001f8e800 nid=0x653b waiting on
condition [0x00007fda1441f000]
2010-08-12_19:35:09.31087 java.lang.Thread.State: TIMED_WAITING
(parking)
2010-08-12_19:35:09.31088 at sun.misc.Unsafe.park(Native Method)
2010-08-12_19:35:09.31088 - parking to wait for
<0x000000075bf09e48> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2010-08-12_19:35:09.31089 at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
2010-08-12_19:35:09.31089 at
java.util.concurrent.locks.AbstractQueuedSynchronizer
$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
2010-08-12_19:35:09.31091 at
java.util.concurrent.DelayQueue.take(DelayQueue.java:193)
2010-08-12_19:35:09.31091 at
java.util.concurrent.ScheduledThreadPoolExecutor
$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688)
2010-08-12_19:35:09.31092 at
java.util.concurrent.ScheduledThreadPoolExecutor
$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681)
2010-08-12_19:35:09.31092 at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:
1043)
2010-08-12_19:35:09.31093 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1103)
2010-08-12_19:35:09.31093 at
java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
2010-08-12_19:35:09.31094 at java.lang.Thread.run(Thread.java:
636)
2010-08-12_19:35:09.31094
2010-08-12_19:35:09.31095 "elasticsearch[Thumb, Tom][sc]-pool-2-
thread-1" daemon prio=10 tid=0x00007fda101c6000 nid=0x653a waiting on
condition [0x00007fda14af4000]
2010-08-12_19:35:09.31095 java.lang.Thread.State: TIMED_WAITING
(parking)
2010-08-12_19:35:09.31096 at sun.misc.Unsafe.park(Native Method)
2010-08-12_19:35:09.31096 - parking to wait for
<0x000000075bf09e48> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2010-08-12_19:35:09.31097 at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
2010-08-12_19:35:09.31097 at
java.util.concurrent.locks.AbstractQueuedSynchronizer
$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
2010-08-12_19:35:09.31099 at
java.util.concurrent.DelayQueue.take(DelayQueue.java:193)
2010-08-12_19:35:09.31099 at
java.util.concurrent.ScheduledThreadPoolExecutor
$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688)
2010-08-12_19:35:09.31100 at
java.util.concurrent.ScheduledThreadPoolExecutor
$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681)
2010-08-12_19:35:09.31100 at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:
1043)
2010-08-12_19:35:09.31101 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1103)
2010-08-12_19:35:09.31102 at
java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
2010-08-12_19:35:09.31103 at java.lang.Thread.run(Thread.java:
636)
2010-08-12_19:35:09.31103
2010-08-12_19:35:09.31104 "Low Memory Detector" daemon prio=10
tid=0x0000000001664000 nid=0x6532 runnable [0x0000000000000000]
2010-08-12_19:35:09.31104 java.lang.Thread.State: RUNNABLE
2010-08-12_19:35:09.31105
2010-08-12_19:35:09.31105 "CompilerThread1" daemon prio=10
tid=0x0000000001661800 nid=0x6531 waiting on condition
[0x0000000000000000]
2010-08-12_19:35:09.31106 java.lang.Thread.State: RUNNABLE
2010-08-12_19:35:09.31106
2010-08-12_19:35:09.31107 "CompilerThread0" daemon prio=10
tid=0x000000000165b000 nid=0x6530 waiting on condition
[0x0000000000000000]
2010-08-12_19:35:09.31108 java.lang.Thread.State: RUNNABLE
2010-08-12_19:35:09.31109
2010-08-12_19:35:09.31109 "Signal Dispatcher" daemon prio=10
tid=0x0000000001659000 nid=0x652f waiting on condition
[0x0000000000000000]
2010-08-12_19:35:09.31110 java.lang.Thread.State: RUNNABLE
2010-08-12_19:35:09.31110
2010-08-12_19:35:09.31110 "Surrogate Locker Thread (CMS)" daemon
prio=10 tid=0x0000000001657800 nid=0x652e waiting on condition
[0x0000000000000000]
2010-08-12_19:35:09.31111 java.lang.Thread.State: RUNNABLE
2010-08-12_19:35:09.31111
2010-08-12_19:35:09.31112 "Finalizer" daemon prio=10
tid=0x0000000001635800 nid=0x652d in Object.wait()
[0x00007fda152c4000]
2010-08-12_19:35:09.31112 java.lang.Thread.State: WAITING (on
object monitor)
2010-08-12_19:35:09.31113 at java.lang.Object.wait(Native
Method)
2010-08-12_19:35:09.31113 - waiting on <0x000000075bd900c0> (a
java.lang.ref.ReferenceQueue$Lock)
2010-08-12_19:35:09.31114 at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)
2010-08-12_19:35:09.31114 - locked <0x000000075bd900c0> (a
java.lang.ref.ReferenceQueue$Lock)
2010-08-12_19:35:09.31115 at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)
2010-08-12_19:35:09.31115 at java.lang.ref.Finalizer
$FinalizerThread.run(Finalizer.java:177)
2010-08-12_19:35:09.31116
2010-08-12_19:35:09.31116 "Reference Handler" daemon prio=10
tid=0x0000000001633800 nid=0x652c in Object.wait()
[0x00007fda153c5000]
2010-08-12_19:35:09.31118 java.lang.Thread.State: WAITING (on
object monitor)
2010-08-12_19:35:09.31118 at java.lang.Object.wait(Native
Method)
2010-08-12_19:35:09.31119 - waiting on <0x000000075bd900f0> (a
java.lang.ref.Reference$Lock)
2010-08-12_19:35:09.31119 at java.lang.Object.wait(Object.java:
502)
2010-08-12_19:35:09.31120 at java.lang.ref.Reference
$ReferenceHandler.run(Reference.java:133)
2010-08-12_19:35:09.31120 - locked <0x000000075bd900f0> (a
java.lang.ref.Reference$Lock)
2010-08-12_19:35:09.31121
2010-08-12_19:35:09.31121 "VM Thread" prio=10 tid=0x000000000162f000
nid=0x652b runnable
2010-08-12_19:35:09.31122
2010-08-12_19:35:09.31122 "Gang worker#0 (Parallel GC Threads)"
prio=10 tid=0x000000000156b800 nid=0x6528 runnable
2010-08-12_19:35:09.31123
2010-08-12_19:35:09.31123 "Gang worker#1 (Parallel GC Threads)"
prio=10 tid=0x000000000156d000 nid=0x6529 runnable
2010-08-12_19:35:09.31124
2010-08-12_19:35:09.31124 "Concurrent Mark-Sweep GC Thread" prio=10
tid=0x00000000015c2800 nid=0x652a runnable
2010-08-12_19:35:09.31125 "VM Periodic Task Thread" prio=10
tid=0x0000000001667000 nid=0x6533 waiting on condition
2010-08-12_19:35:09.31125
2010-08-12_19:35:09.31125 JNI global references: 767
2010-08-12_19:35:09.31126
2010-08-12_19:35:09.31126 Heap
2010-08-12_19:35:09.31127 par new generation total 19136K, used
12115K [0x0000000759400000, 0x000000075a8c0000, 0x000000075bd90000)
2010-08-12_19:35:09.31127 eden space 17024K, 68% used
[0x0000000759400000, 0x0000000759f6c698, 0x000000075a4a0000)
2010-08-12_19:35:09.31129 from space 2112K, 19% used
[0x000000075a6b0000, 0x000000075a718820, 0x000000075a8c0000)
2010-08-12_19:35:09.31129 to space 2112K, 0% used
[0x000000075a4a0000, 0x000000075a4a0000, 0x000000075a6b0000)
2010-08-12_19:35:09.31130 concurrent mark-sweep generation total
109824K, used 34250K [0x000000075bd90000, 0x00000007628d0000,
0x00000007f5800000)
2010-08-12_19:35:09.31131 concurrent-mark-sweep perm gen total
35396K, used 30694K [0x00000007f5800000, 0x00000007f7a91000,
0x0000000800000000)
2010-08-12_19:35:09.31132

On Aug 12, 12:03 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

It depends on the actual implementation. In general, the time it takes to do
recovery is how fast we can get data from s3 when a full recovery from it is
required (elasticsearch tries to minimize that by reusing the work dir,
though you should set the recover_after_nodes setting).

My concern is that maybe the recovery process is stuck somewhere. Not sure
where though... . Is there a chance for a thread dump on one of the nodes
that have a primary shard in INITIALIZING state? You can do that with
visualvm.

-shay.banon

On Thu, Aug 12, 2010 at 9:50 PM, Grant Rodgers gra...@gmail.com wrote:

Yes, we are using the s3 gateway and version 0.9.0.

Would we get better gateway performance by using the fs gateway and a
network file system, like gluster, drbd, or ceph?

On Aug 12, 11:34 am, Shay Banon shay.ba...@elasticsearch.com wrote:

If I remember correctly, you use the s3 gateway, right? It seems like
some
primary shards are stuck initializing (basically, recovering from the
gateway). I forgot to ask, which version are you using?

-shay.banon

On Thu, Aug 12, 2010 at 9:01 PM, Grant Rodgers gra...@gmail.com wrote:

Fyi, 19 hours later, there is no change in the state of the cluster,
and the logs are still bouncing between created and recovering. We
will delete and recreate the missing indices, but it seems like
recovery should have happened by now. What is the expected recovery
time for a shard with a few hundred thousand documents?

Here is the cluster state for reference. The dacbe3f4-ef13-43ba-aa13-
b6457780f145 node had been restarted a few times, and maybe that
caused some shards to get stuck in the relocating state? Still, I'd
expect that at least one shard would have recovered by now.

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
} ],
"1" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
} ],
"2" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
} ],
"3" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
} ],
"4" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
} ],
"5" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" :

...

read more »


(Shay Banon) #11

Hi,

Let me try and explain a bit how gateway recovery works. An index can have
several shards each with 0 or more replicas. When you start your nodes fresh
(you should use recover_after_nodes setting for optimal allocation), then
first the primary shards are allocated to the nodes. They move to a cluster
wide state of INITIALIZING and start recover from the gateway. In order no
to create very high load on a node, only X number of shard recoveries are
allowed per node (defaults to the number of processors / cores). Also, the
number of open streams (a stream is, for example, fetching a specific file)
is throttled and also defaults to the number of processors / cores.

The gateway recovery process itself first fetches all the index files from
the gateway, and then fetches the transaction log and applies all the
operations within it to the index.

It seems like all the shards on the primary shards are in INITIALIZING
state on the nodes and being recovered from the gateway. The work directory
should definitely grow over time, unless it got into the applying the
translog operations to the index, where there is in memory buffering done
during indexing time until it is flushed to disk.

Are the shards still not recovered? Its really annoying that you can't
really tell the recovery progress through an API. I will add it for 0.9.1.
This should simplify tracking down whats going on.

-shay.banon

On Thu, Aug 12, 2010 at 10:39 PM, Grant Rodgers grantr@gmail.com wrote:

Unfortunately the nodes mentioned originally have been stopped. But it
seems the new cluster is also having issues recovering. The work
directories have not changed in size for about 10 minutes.

Here is the routing table and nodes (thread dumps follow)

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
} ]
}
},
"revisions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 3,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "revisions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 4,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "revisions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 5,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "revisions"
} ]
}
}
}
},
"routing_nodes" : {
"unassigned" : [ {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "revisions"
} ],
"nodes" : {
"d9bd8b92-90d5-4892-bea8-46896b467974" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 5,
"index" : "revisions"
} ],
"eeb2af20-13f7-41e3-9dca-30b768785b4f" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 3,
"index" : "revisions"
}, {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 4,
"index" : "revisions"
} ]
}

Here is a thread dump from node eeb2af20-13f7-41e3-9dca-30b768785b4f

2010-08-12_19:35:09.31077 java.lang.Thread.State: TIMED_WAITING
(parking)
2010-08-12_19:35:09.31079 at sun.misc.Unsafe.park(Native Method)
2010-08-12_19:35:09.31080 - parking to wait for
<0x000000075bf09e48> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2010-08-12_19:35:09.31080 at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
2010-08-12_19:35:09.31081 at
java.util.concurrent.locks.AbstractQueuedSynchronizer
$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
2010-08-12_19:35:09.31081 at
java.util.concurrent.DelayQueue.take(DelayQueue.java:193)
2010-08-12_19:35:09.31082 at
java.util.concurrent.ScheduledThreadPoolExecutor
$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688)
2010-08-12_19:35:09.31083 at
java.util.concurrent.ScheduledThreadPoolExecutor
$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681)
2010-08-12_19:35:09.31084 at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:
1043)
2010-08-12_19:35:09.31084 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1103)
2010-08-12_19:35:09.31085 at
java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
2010-08-12_19:35:09.31085 at java.lang.Thread.run(Thread.java:
636)
2010-08-12_19:35:09.31086
2010-08-12_19:35:09.31086 "elasticsearch[Thumb, Tom][sc]-pool-2-
thread-2" daemon prio=10 tid=0x0000000001f8e800 nid=0x653b waiting on
condition [0x00007fda1441f000]
2010-08-12_19:35:09.31087 java.lang.Thread.State: TIMED_WAITING
(parking)
2010-08-12_19:35:09.31088 at sun.misc.Unsafe.park(Native Method)
2010-08-12_19:35:09.31088 - parking to wait for
<0x000000075bf09e48> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2010-08-12_19:35:09.31089 at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
2010-08-12_19:35:09.31089 at
java.util.concurrent.locks.AbstractQueuedSynchronizer
$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
2010-08-12_19:35:09.31091 at
java.util.concurrent.DelayQueue.take(DelayQueue.java:193)
2010-08-12_19:35:09.31091 at
java.util.concurrent.ScheduledThreadPoolExecutor
$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688)
2010-08-12_19:35:09.31092 at
java.util.concurrent.ScheduledThreadPoolExecutor
$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681)
2010-08-12_19:35:09.31092 at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:
1043)
2010-08-12_19:35:09.31093 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1103)
2010-08-12_19:35:09.31093 at
java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
2010-08-12_19:35:09.31094 at java.lang.Thread.run(Thread.java:
636)
2010-08-12_19:35:09.31094
2010-08-12_19:35:09.31095 "elasticsearch[Thumb, Tom][sc]-pool-2-
thread-1" daemon prio=10 tid=0x00007fda101c6000 nid=0x653a waiting on
condition [0x00007fda14af4000]
2010-08-12_19:35:09.31095 java.lang.Thread.State: TIMED_WAITING
(parking)
2010-08-12_19:35:09.31096 at sun.misc.Unsafe.park(Native Method)
2010-08-12_19:35:09.31096 - parking to wait for
<0x000000075bf09e48> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2010-08-12_19:35:09.31097 at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
2010-08-12_19:35:09.31097 at
java.util.concurrent.locks.AbstractQueuedSynchronizer
$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
2010-08-12_19:35:09.31099 at
java.util.concurrent.DelayQueue.take(DelayQueue.java:193)
2010-08-12_19:35:09.31099 at
java.util.concurrent.ScheduledThreadPoolExecutor
$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688)
2010-08-12_19:35:09.31100 at
java.util.concurrent.ScheduledThreadPoolExecutor
$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681)
2010-08-12_19:35:09.31100 at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:
1043)
2010-08-12_19:35:09.31101 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1103)
2010-08-12_19:35:09.31102 at
java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
2010-08-12_19:35:09.31103 at java.lang.Thread.run(Thread.java:
636)
2010-08-12_19:35:09.31103
2010-08-12_19:35:09.31104 "Low Memory Detector" daemon prio=10
tid=0x0000000001664000 nid=0x6532 runnable [0x0000000000000000]
2010-08-12_19:35:09.31104 java.lang.Thread.State: RUNNABLE
2010-08-12_19:35:09.31105
2010-08-12_19:35:09.31105 "CompilerThread1" daemon prio=10
tid=0x0000000001661800 nid=0x6531 waiting on condition
[0x0000000000000000]
2010-08-12_19:35:09.31106 java.lang.Thread.State: RUNNABLE
2010-08-12_19:35:09.31106
2010-08-12_19:35:09.31107 "CompilerThread0" daemon prio=10
tid=0x000000000165b000 nid=0x6530 waiting on condition
[0x0000000000000000]
2010-08-12_19:35:09.31108 java.lang.Thread.State: RUNNABLE
2010-08-12_19:35:09.31109
2010-08-12_19:35:09.31109 "Signal Dispatcher" daemon prio=10
tid=0x0000000001659000 nid=0x652f waiting on condition
[0x0000000000000000]
2010-08-12_19:35:09.31110 java.lang.Thread.State: RUNNABLE
2010-08-12_19:35:09.31110
2010-08-12_19:35:09.31110 "Surrogate Locker Thread (CMS)" daemon
prio=10 tid=0x0000000001657800 nid=0x652e waiting on condition
[0x0000000000000000]
2010-08-12_19:35:09.31111 java.lang.Thread.State: RUNNABLE
2010-08-12_19:35:09.31111
2010-08-12_19:35:09.31112 "Finalizer" daemon prio=10
tid=0x0000000001635800 nid=0x652d in Object.wait()
[0x00007fda152c4000]
2010-08-12_19:35:09.31112 java.lang.Thread.State: WAITING (on
object monitor)
2010-08-12_19:35:09.31113 at java.lang.Object.wait(Native
Method)
2010-08-12_19:35:09.31113 - waiting on <0x000000075bd900c0> (a
java.lang.ref.ReferenceQueue$Lock)
2010-08-12_19:35:09.31114 at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)
2010-08-12_19:35:09.31114 - locked <0x000000075bd900c0> (a
java.lang.ref.ReferenceQueue$Lock)
2010-08-12_19:35:09.31115 at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)
2010-08-12_19:35:09.31115 at java.lang.ref.Finalizer
$FinalizerThread.run(Finalizer.java:177)
2010-08-12_19:35:09.31116
2010-08-12_19:35:09.31116 "Reference Handler" daemon prio=10
tid=0x0000000001633800 nid=0x652c in Object.wait()
[0x00007fda153c5000]
2010-08-12_19:35:09.31118 java.lang.Thread.State: WAITING (on
object monitor)
2010-08-12_19:35:09.31118 at java.lang.Object.wait(Native
Method)
2010-08-12_19:35:09.31119 - waiting on <0x000000075bd900f0> (a
java.lang.ref.Reference$Lock)
2010-08-12_19:35:09.31119 at java.lang.Object.wait(Object.java:
502)
2010-08-12_19:35:09.31120 at java.lang.ref.Reference
$ReferenceHandler.run(Reference.java:133)
2010-08-12_19:35:09.31120 - locked <0x000000075bd900f0> (a
java.lang.ref.Reference$Lock)
2010-08-12_19:35:09.31121
2010-08-12_19:35:09.31121 "VM Thread" prio=10 tid=0x000000000162f000
nid=0x652b runnable
2010-08-12_19:35:09.31122
2010-08-12_19:35:09.31122 "Gang worker#0 (Parallel GC Threads)"
prio=10 tid=0x000000000156b800 nid=0x6528 runnable
2010-08-12_19:35:09.31123
2010-08-12_19:35:09.31123 "Gang worker#1 (Parallel GC Threads)"
prio=10 tid=0x000000000156d000 nid=0x6529 runnable
2010-08-12_19:35:09.31124
2010-08-12_19:35:09.31124 "Concurrent Mark-Sweep GC Thread" prio=10
tid=0x00000000015c2800 nid=0x652a runnable
2010-08-12_19:35:09.31125 "VM Periodic Task Thread" prio=10
tid=0x0000000001667000 nid=0x6533 waiting on condition
2010-08-12_19:35:09.31125
2010-08-12_19:35:09.31125 JNI global references: 767
2010-08-12_19:35:09.31126
2010-08-12_19:35:09.31126 Heap
2010-08-12_19:35:09.31127 par new generation total 19136K, used
12115K [0x0000000759400000, 0x000000075a8c0000, 0x000000075bd90000)
2010-08-12_19:35:09.31127 eden space 17024K, 68% used
[0x0000000759400000, 0x0000000759f6c698, 0x000000075a4a0000)
2010-08-12_19:35:09.31129 from space 2112K, 19% used
[0x000000075a6b0000, 0x000000075a718820, 0x000000075a8c0000)
2010-08-12_19:35:09.31129 to space 2112K, 0% used
[0x000000075a4a0000, 0x000000075a4a0000, 0x000000075a6b0000)
2010-08-12_19:35:09.31130 concurrent mark-sweep generation total
109824K, used 34250K [0x000000075bd90000, 0x00000007628d0000,
0x00000007f5800000)
2010-08-12_19:35:09.31131 concurrent-mark-sweep perm gen total
35396K, used 30694K [0x00000007f5800000, 0x00000007f7a91000,
0x0000000800000000)
2010-08-12_19:35:09.31132

On Aug 12, 12:03 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

It depends on the actual implementation. In general, the time it takes to
do
recovery is how fast we can get data from s3 when a full recovery from it
is
required (elasticsearch tries to minimize that by reusing the work dir,
though you should set the recover_after_nodes setting).

My concern is that maybe the recovery process is stuck somewhere. Not
sure
where though... . Is there a chance for a thread dump on one of the nodes
that have a primary shard in INITIALIZING state? You can do that with
visualvm.

-shay.banon

On Thu, Aug 12, 2010 at 9:50 PM, Grant Rodgers gra...@gmail.com wrote:

Yes, we are using the s3 gateway and version 0.9.0.

Would we get better gateway performance by using the fs gateway and a
network file system, like gluster, drbd, or ceph?

On Aug 12, 11:34 am, Shay Banon shay.ba...@elasticsearch.com wrote:

If I remember correctly, you use the s3 gateway, right? It seems like
some
primary shards are stuck initializing (basically, recovering from the
gateway). I forgot to ask, which version are you using?

-shay.banon

On Thu, Aug 12, 2010 at 9:01 PM, Grant Rodgers gra...@gmail.com
wrote:

Fyi, 19 hours later, there is no change in the state of the
cluster,

and the logs are still bouncing between created and recovering. We
will delete and recreate the missing indices, but it seems like
recovery should have happened by now. What is the expected recovery
time for a shard with a few hundred thousand documents?

Here is the cluster state for reference. The
dacbe3f4-ef13-43ba-aa13-

b6457780f145 node had been restarted a few times, and maybe that
caused some shards to get stuck in the relocating state? Still, I'd
expect that at least one shard would have recovered by now.

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
} ],
"1" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
} ],
"2" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
} ],
"3" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
} ],
"4" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "dacbe3f4-ef13-43ba-aa13-b6457780f145",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
} ],
"5" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "b411c443-ebdd-416a-ac35-cce5af38fa3f",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "a4eada23-ab66-4ab5-bf20-8de09e25a6a3",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" :

...

read more »


(Grant Rodgers) #12

Ok, thanks for the explanation. Good to know about the throttling
parameters. We are now using recover_after_time: 1m to stabilize shard
allocation.

Maybe it would be worth it to prioritize local shard recovery since (I
assume) that can happen quicker than gateway recovery? I could be
completely wrong about how it works though.

I let the recovery process run until about 15 minutes ago, so roughly
3.5 hours. There was still no change in the size of the work
directories, and no shards had changed state out of INITIALIZING. It
is curious, because I think at least some of the gateway data had been
transferred, so I think the recovery started ok, but maybe at some
point got stuck.

We decided to just delete the versions and revisions indices and
recreate them. Now I am watching the recovery process of the other
index, stats. it seems to be progressing smoothly. Maybe there was
something unusual about the other indexes that causes the recovery
process to get wedged.

An API to watch recovery status would be super useful, looking forward
to that.

Grant

On Aug 12, 3:14 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Let me try and explain a bit how gateway recovery works. An index can have
several shards each with 0 or more replicas. When you start your nodes fresh
(you should use recover_after_nodes setting for optimal allocation), then
first the primary shards are allocated to the nodes. They move to a cluster
wide state of INITIALIZING and start recover from the gateway. In order no
to create very high load on a node, only X number of shard recoveries are
allowed per node (defaults to the number of processors / cores). Also, the
number of open streams (a stream is, for example, fetching a specific file)
is throttled and also defaults to the number of processors / cores.

The gateway recovery process itself first fetches all the index files from
the gateway, and then fetches the transaction log and applies all the
operations within it to the index.

It seems like all the shards on the primary shards are in INITIALIZING
state on the nodes and being recovered from the gateway. The work directory
should definitely grow over time, unless it got into the applying the
translog operations to the index, where there is in memory buffering done
during indexing time until it is flushed to disk.

Are the shards still not recovered? Its really annoying that you can't
really tell the recovery progress through an API. I will add it for 0.9.1.
This should simplify tracking down whats going on.

-shay.banon

On Thu, Aug 12, 2010 at 10:39 PM, Grant Rodgers gra...@gmail.com wrote:

Unfortunately the nodes mentioned originally have been stopped. But it
seems the new cluster is also having issues recovering. The work
directories have not changed in size for about 10 minutes.

Here is the routing table and nodes (thread dumps follow)

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
} ]
}
},
"revisions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
} ],
"2" : [ {

...

read more »


(Shay Banon) #13

Hi,

There isn't really any local shard recovery. The "reuse work location"
feature works by allocating shards to nodes that have the smallest diff
(diff between the local work storage for that shard before it was shutdown
and the gateway). The recovery process for the shard, once its allocated to
that node, only fetches the changes between the local storage and the
gateway. This, when issuing complete cluster shutdown (and always make sure
you do it using the shutdown API), and not a complete disaster where the
disks are gone, can greatly reduce the time it takes to recover from the
gateway.

The only place that I can think where it might get stuck is in the API to
s3. I am using the official AWS SDK for that, I will check where this might
happen. I did not see it in the thread dump you sent, but I forgot to
mention that it looks suspiciously small on threads... (funnily enough, the
AWS S3 gateway is probably the only place in elasticsearch where thread dump
might make sense, since they use blocking IO).

I am sorry that this happened. I will work on the recovery status API and
ping you once its done. This way it will be easier to tell at least where
its stuck. I assume there were no errors in the logs, right?

-shay.banon

On Fri, Aug 13, 2010 at 2:05 AM, Grant Rodgers grantr@gmail.com wrote:

Ok, thanks for the explanation. Good to know about the throttling
parameters. We are now using recover_after_time: 1m to stabilize shard
allocation.

Maybe it would be worth it to prioritize local shard recovery since (I
assume) that can happen quicker than gateway recovery? I could be
completely wrong about how it works though.

I let the recovery process run until about 15 minutes ago, so roughly
3.5 hours. There was still no change in the size of the work
directories, and no shards had changed state out of INITIALIZING. It
is curious, because I think at least some of the gateway data had been
transferred, so I think the recovery started ok, but maybe at some
point got stuck.

We decided to just delete the versions and revisions indices and
recreate them. Now I am watching the recovery process of the other
index, stats. it seems to be progressing smoothly. Maybe there was
something unusual about the other indexes that causes the recovery
process to get wedged.

An API to watch recovery status would be super useful, looking forward
to that.

Grant

On Aug 12, 3:14 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Let me try and explain a bit how gateway recovery works. An index can
have
several shards each with 0 or more replicas. When you start your nodes
fresh
(you should use recover_after_nodes setting for optimal allocation), then
first the primary shards are allocated to the nodes. They move to a
cluster
wide state of INITIALIZING and start recover from the gateway. In order
no
to create very high load on a node, only X number of shard recoveries are
allowed per node (defaults to the number of processors / cores). Also,
the
number of open streams (a stream is, for example, fetching a specific
file)
is throttled and also defaults to the number of processors / cores.

The gateway recovery process itself first fetches all the index files
from
the gateway, and then fetches the transaction log and applies all the
operations within it to the index.

It seems like all the shards on the primary shards are in INITIALIZING
state on the nodes and being recovered from the gateway. The work
directory
should definitely grow over time, unless it got into the applying the
translog operations to the index, where there is in memory buffering done
during indexing time until it is flushed to disk.

Are the shards still not recovered? Its really annoying that you can't
really tell the recovery progress through an API. I will add it for
0.9.1.
This should simplify tracking down whats going on.

-shay.banon

On Thu, Aug 12, 2010 at 10:39 PM, Grant Rodgers gra...@gmail.com
wrote:

Unfortunately the nodes mentioned originally have been stopped. But it
seems the new cluster is also having issues recovering. The work
directories have not changed in size for about 10 minutes.

Here is the routing table and nodes (thread dumps follow)

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "stats"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "stats"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "stats"
} ]
}
},
"revisions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "revisions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "revisions"
} ],
"2" : [ {

...

read more »


(Grant Rodgers) #14

I haven't seen any errors in the log, but there have been a few
warnings.

[00:32:39,475][WARN ][index.gateway.s3 ] [Sunset Bain] [stats]
[0] no file [_91q0.cfs] to recover, even though it has md5, ignoring
it
[00:32:45,591][WARN ][index.gateway.s3 ] [Sunset Bain] [stats]
[0] no file [_5k99.cfs] to recover, even though it has md5, ignoring
it

Right now, for our stats index, shard 0 appears to be stuck in the
same way as previous attempts. In an attempt to determine whether it
was a corrupted shard or some transient error, I stopped that node and
let the shard recover on a different node. It is still initializing on
that node, and that's where I saw those warning messages.

Could it be an issue with the gateway data? Maybe something can happen
to a gateway snapshot that makes it unrecoverable.

Thanks for all the quick responses and advice!

On Aug 12, 4:13 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

There isn't really any local shard recovery. The "reuse work location"
feature works by allocating shards to nodes that have the smallest diff
(diff between the local work storage for that shard before it was shutdown
and the gateway). The recovery process for the shard, once its allocated to
that node, only fetches the changes between the local storage and the
gateway. This, when issuing complete cluster shutdown (and always make sure
you do it using the shutdown API), and not a complete disaster where the
disks are gone, can greatly reduce the time it takes to recover from the
gateway.

The only place that I can think where it might get stuck is in the API to
s3. I am using the official AWS SDK for that, I will check where this might
happen. I did not see it in the thread dump you sent, but I forgot to
mention that it looks suspiciously small on threads... (funnily enough, the
AWS S3 gateway is probably the only place in elasticsearch where thread dump
might make sense, since they use blocking IO).

I am sorry that this happened. I will work on the recovery status API and
ping you once its done. This way it will be easier to tell at least where
its stuck. I assume there were no errors in the logs, right?

-shay.banon

On Fri, Aug 13, 2010 at 2:05 AM, Grant Rodgers gra...@gmail.com wrote:

Ok, thanks for the explanation. Good to know about the throttling
parameters. We are now using recover_after_time: 1m to stabilize shard
allocation.

Maybe it would be worth it to prioritize local shard recovery since (I
assume) that can happen quicker than gateway recovery? I could be
completely wrong about how it works though.

I let the recovery process run until about 15 minutes ago, so roughly
3.5 hours. There was still no change in the size of the work
directories, and no shards had changed state out of INITIALIZING. It
is curious, because I think at least some of the gateway data had been
transferred, so I think the recovery started ok, but maybe at some
point got stuck.

We decided to just delete the versions and revisions indices and
recreate them. Now I am watching the recovery process of the other
index, stats. it seems to be progressing smoothly. Maybe there was
something unusual about the other indexes that causes the recovery
process to get wedged.

An API to watch recovery status would be super useful, looking forward
to that.

Grant

On Aug 12, 3:14 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Let me try and explain a bit how gateway recovery works. An index can
have
several shards each with 0 or more replicas. When you start your nodes
fresh
(you should use recover_after_nodes setting for optimal allocation), then
first the primary shards are allocated to the nodes. They move to a
cluster
wide state of INITIALIZING and start recover from the gateway. In order
no
to create very high load on a node, only X number of shard recoveries are
allowed per node (defaults to the number of processors / cores). Also,
the
number of open streams (a stream is, for example, fetching a specific
file)
is throttled and also defaults to the number of processors / cores.

The gateway recovery process itself first fetches all the index files
from
the gateway, and then fetches the transaction log and applies all the
operations within it to the index.

It seems like all the shards on the primary shards are in INITIALIZING
state on the nodes and being recovered from the gateway. The work
directory
should definitely grow over time, unless it got into the applying the
translog operations to the index, where there is in memory buffering done
during indexing time until it is flushed to disk.

Are the shards still not recovered? Its really annoying that you can't
really tell the recovery progress through an API. I will add it for
0.9.1.
This should simplify tracking down whats going on.

-shay.banon

On Thu, Aug 12, 2010 at 10:39 PM, Grant Rodgers gra...@gmail.com
wrote:

Unfortunately the nodes mentioned originally have been stopped. But it
seems the new cluster is also having issues recovering. The work
directories have not changed in size for about 10 minutes.

Here is the routing table and nodes (thread dumps follow)

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "stats"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "stats"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "stats"
}, {
"state" : "UNASSIGNED",

...

read more »


(Grant Rodgers) #15

This was fixed by http://github.com/elasticsearch/elasticsearch/issues/closed#issue/318.
Today our index recovered successfully from s3.

Thanks for the quick turnaround! One day from report to fix - not many
projects can do that.

On Aug 12, 5:56 pm, Grant Rodgers gra...@gmail.com wrote:

I haven't seen any errors in the log, but there have been a few
warnings.

[00:32:39,475][WARN ][index.gateway.s3 ] [Sunset Bain] [stats]
[0] no file [_91q0.cfs] to recover, even though it has md5, ignoring
it
[00:32:45,591][WARN ][index.gateway.s3 ] [Sunset Bain] [stats]
[0] no file [_5k99.cfs] to recover, even though it has md5, ignoring
it

Right now, for our stats index, shard 0 appears to be stuck in the
same way as previous attempts. In an attempt to determine whether it
was a corrupted shard or some transient error, I stopped that node and
let the shard recover on a different node. It is still initializing on
that node, and that's where I saw those warning messages.

Could it be an issue with the gateway data? Maybe something can happen
to a gateway snapshot that makes it unrecoverable.

Thanks for all the quick responses and advice!

On Aug 12, 4:13 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

There isn't really any local shard recovery. The "reuse work location"
feature works by allocating shards to nodes that have the smallest diff
(diff between the local work storage for that shard before it was shutdown
and the gateway). The recovery process for the shard, once its allocated to
that node, only fetches the changes between the local storage and the
gateway. This, when issuing complete cluster shutdown (and always make sure
you do it using the shutdown API), and not a complete disaster where the
disks are gone, can greatly reduce the time it takes to recover from the
gateway.

The only place that I can think where it might get stuck is in the API to
s3. I am using the official AWS SDK for that, I will check where this might
happen. I did not see it in the thread dump you sent, but I forgot to
mention that it looks suspiciously small on threads... (funnily enough, the
AWS S3 gateway is probably the only place in elasticsearch where thread dump
might make sense, since they use blocking IO).

I am sorry that this happened. I will work on the recovery status API and
ping you once its done. This way it will be easier to tell at least where
its stuck. I assume there were no errors in the logs, right?

-shay.banon

On Fri, Aug 13, 2010 at 2:05 AM, Grant Rodgers gra...@gmail.com wrote:

Ok, thanks for the explanation. Good to know about the throttling
parameters. We are now using recover_after_time: 1m to stabilize shard
allocation.

Maybe it would be worth it to prioritize local shard recovery since (I
assume) that can happen quicker than gateway recovery? I could be
completely wrong about how it works though.

I let the recovery process run until about 15 minutes ago, so roughly
3.5 hours. There was still no change in the size of the work
directories, and no shards had changed state out of INITIALIZING. It
is curious, because I think at least some of the gateway data had been
transferred, so I think the recovery started ok, but maybe at some
point got stuck.

We decided to just delete the versions and revisions indices and
recreate them. Now I am watching the recovery process of the other
index, stats. it seems to be progressing smoothly. Maybe there was
something unusual about the other indexes that causes the recovery
process to get wedged.

An API to watch recovery status would be super useful, looking forward
to that.

Grant

On Aug 12, 3:14 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Let me try and explain a bit how gateway recovery works. An index can
have
several shards each with 0 or more replicas. When you start your nodes
fresh
(you should use recover_after_nodes setting for optimal allocation), then
first the primary shards are allocated to the nodes. They move to a
cluster
wide state of INITIALIZING and start recover from the gateway. In order
no
to create very high load on a node, only X number of shard recoveries are
allowed per node (defaults to the number of processors / cores). Also,
the
number of open streams (a stream is, for example, fetching a specific
file)
is throttled and also defaults to the number of processors / cores.

The gateway recovery process itself first fetches all the index files
from
the gateway, and then fetches the transaction log and applies all the
operations within it to the index.

It seems like all the shards on the primary shards are in INITIALIZING
state on the nodes and being recovered from the gateway. The work
directory
should definitely grow over time, unless it got into the applying the
translog operations to the index, where there is in memory buffering done
during indexing time until it is flushed to disk.

Are the shards still not recovered? Its really annoying that you can't
really tell the recovery progress through an API. I will add it for
0.9.1.
This should simplify tracking down whats going on.

-shay.banon

On Thu, Aug 12, 2010 at 10:39 PM, Grant Rodgers gra...@gmail.com
wrote:

Unfortunately the nodes mentioned originally have been stopped. But it
seems the new cluster is also having issues recovering. The work
directories have not changed in size for about 10 minutes.

Here is the routing table and nodes (thread dumps follow)

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",

...

read more »


(Shay Banon) #16

Great that its fixed! This bug will certainly push the timeline for 0.9.1
release to a closer date (probably sometime next week).

-shay.banon

On Sat, Aug 14, 2010 at 12:59 AM, Grant Rodgers grantr@gmail.com wrote:

This was fixed by
http://github.com/elasticsearch/elasticsearch/issues/closed#issue/318.
Today our index recovered successfully from s3.

Thanks for the quick turnaround! One day from report to fix - not many
projects can do that.

On Aug 12, 5:56 pm, Grant Rodgers gra...@gmail.com wrote:

I haven't seen any errors in the log, but there have been a few
warnings.

[00:32:39,475][WARN ][index.gateway.s3 ] [Sunset Bain] [stats]
[0] no file [_91q0.cfs] to recover, even though it has md5, ignoring
it
[00:32:45,591][WARN ][index.gateway.s3 ] [Sunset Bain] [stats]
[0] no file [_5k99.cfs] to recover, even though it has md5, ignoring
it

Right now, for our stats index, shard 0 appears to be stuck in the
same way as previous attempts. In an attempt to determine whether it
was a corrupted shard or some transient error, I stopped that node and
let the shard recover on a different node. It is still initializing on
that node, and that's where I saw those warning messages.

Could it be an issue with the gateway data? Maybe something can happen
to a gateway snapshot that makes it unrecoverable.

Thanks for all the quick responses and advice!

On Aug 12, 4:13 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

There isn't really any local shard recovery. The "reuse work
location"

feature works by allocating shards to nodes that have the smallest diff
(diff between the local work storage for that shard before it was
shutdown

and the gateway). The recovery process for the shard, once its
allocated to

that node, only fetches the changes between the local storage and the
gateway. This, when issuing complete cluster shutdown (and always make
sure

you do it using the shutdown API), and not a complete disaster where
the

disks are gone, can greatly reduce the time it takes to recover from
the

gateway.

The only place that I can think where it might get stuck is in the
API to

s3. I am using the official AWS SDK for that, I will check where this
might

happen. I did not see it in the thread dump you sent, but I forgot to
mention that it looks suspiciously small on threads... (funnily enough,
the

AWS S3 gateway is probably the only place in elasticsearch where thread
dump

might make sense, since they use blocking IO).

I am sorry that this happened. I will work on the recovery status
API and

ping you once its done. This way it will be easier to tell at least
where

its stuck. I assume there were no errors in the logs, right?

-shay.banon

On Fri, Aug 13, 2010 at 2:05 AM, Grant Rodgers gra...@gmail.com
wrote:

Ok, thanks for the explanation. Good to know about the throttling
parameters. We are now using recover_after_time: 1m to stabilize
shard

allocation.

Maybe it would be worth it to prioritize local shard recovery since
(I

assume) that can happen quicker than gateway recovery? I could be
completely wrong about how it works though.

I let the recovery process run until about 15 minutes ago, so roughly
3.5 hours. There was still no change in the size of the work
directories, and no shards had changed state out of INITIALIZING. It
is curious, because I think at least some of the gateway data had
been

transferred, so I think the recovery started ok, but maybe at some
point got stuck.

We decided to just delete the versions and revisions indices and
recreate them. Now I am watching the recovery process of the other
index, stats. it seems to be progressing smoothly. Maybe there was
something unusual about the other indexes that causes the recovery
process to get wedged.

An API to watch recovery status would be super useful, looking
forward

to that.

Grant

On Aug 12, 3:14 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Let me try and explain a bit how gateway recovery works. An index
can

have

several shards each with 0 or more replicas. When you start your
nodes

fresh

(you should use recover_after_nodes setting for optimal
allocation), then

first the primary shards are allocated to the nodes. They move to a
cluster
wide state of INITIALIZING and start recover from the gateway. In
order

no

to create very high load on a node, only X number of shard
recoveries are

allowed per node (defaults to the number of processors / cores).
Also,

the

number of open streams (a stream is, for example, fetching a
specific

file)

is throttled and also defaults to the number of processors / cores.

The gateway recovery process itself first fetches all the index
files

from

the gateway, and then fetches the transaction log and applies all
the

operations within it to the index.

It seems like all the shards on the primary shards are in
INITIALIZING

state on the nodes and being recovered from the gateway. The work
directory
should definitely grow over time, unless it got into the applying
the

translog operations to the index, where there is in memory
buffering done

during indexing time until it is flushed to disk.

Are the shards still not recovered? Its really annoying that you
can't

really tell the recovery progress through an API. I will add it for
0.9.1.
This should simplify tracking down whats going on.

-shay.banon

On Thu, Aug 12, 2010 at 10:39 PM, Grant Rodgers gra...@gmail.com
wrote:

Unfortunately the nodes mentioned originally have been stopped.
But it

seems the new cluster is also having issues recovering. The work
directories have not changed in size for about 10 minutes.

Here is the routing table and nodes (thread dumps follow)

"routing_table" : {
"indices" : {
"versions" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "versions"
} ],
"1" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "versions"
} ],
"2" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "versions"
} ],
"3" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "versions"
} ],
"4" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "d9bd8b92-90d5-4892-bea8-46896b467974",
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "versions"
} ],
"5" : [ {
"state" : "INITIALIZING",
"primary" : true,
"node" : "eeb2af20-13f7-41e3-9dca-30b768785b4f",
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 5,
"index" : "versions"
} ]
}
},
"stats" : {
"shards" : {
"0" : [ {
"state" : "INITIALIZING",

...

read more »


(system) #17