Logging reallocation detail during node restore


(John Nader) #1

We are using ES version 0.19.8 in a 10 node cluster with 30 shards.

We had an incident yesterday where the process of reallocating shards to a
node simply hung. The head console showed the node came up health, and
that two shards were in the process of being reallocated to that node.
However, no data transfer occurred (logging at target directory) and
logging showed any progress or issues from either the source nodes or the
new destination node. We were able to recover by bringing the node down
and back up. We have not see this issue again.

I am looking for advice on logging reallocation information such that we
can troubleshoot such an issue if it occurs again. Can someone recommend
settings that will give good detail on reallocation start, progress, and
completion without logging other excessive detail that fills up our logging
directories or makes it difficult to find the relevant entries in the log?

Thanks!

-John

--


(Shay Banon) #2

There isn't a low level recovery logging (i.e. log a message on each chunk of data being recovered) or things of that nature. It probably make sense to add it, just so we can more easily try and resolve something like this. Care to open an issue?

On Aug 22, 2012, at 3:54 PM, John Nader nadernader99@gmail.com wrote:

We are using ES version 0.19.8 in a 10 node cluster with 30 shards.

We had an incident yesterday where the process of reallocating shards to a node simply hung. The head console showed the node came up health, and that two shards were in the process of being reallocated to that node. However, no data transfer occurred (logging at target directory) and logging showed any progress or issues from either the source nodes or the new destination node. We were able to recover by bringing the node down and back up. We have not see this issue again.

I am looking for advice on logging reallocation information such that we can troubleshoot such an issue if it occurs again. Can someone recommend settings that will give good detail on reallocation start, progress, and completion without logging other excessive detail that fills up our logging directories or makes it difficult to find the relevant entries in the log?

Thanks!

-John

--

--


(system) #3