Unexpected cluster behavior


(Andrej) #1

Hi all,

we noticed a very strange behavior of our cluster last night (16
nodes, all running 0.19.4). At a certain time one cluster node
obviously restarted (at least I would interpret the log message
"[2012-05-29 23:33:44,090][INFO ][node ] [Star-
Lord] {0.19.4}[23335]: starting ... " this way). Within the next half
hour another 5 nodes restarted, some of them several (up to 4) times.
In the logs we couldnt find anything like stopping or stopped, what is
the usual output when stopping a node.

My questions:

  • is it possible to start a node that wasnt stopped before?
  • what can actually start an elasticsearch node? At this time we didnt
    use the index, so no requests were sent from our side. Can
    elasticsearch itself restart nodes and if so, what triggers this?
  • finally, after restarting no indices were found. Any ideas on this
    one?

Shay, if you are interested in more information I can send you all
logs. The beer in Berlin is on me :wink:

Thanks!
Andrej


(Shay Banon) #2

Can you share the logs, that would be helpful?

When a node shuts down properly (i.e., sending a kill command (without
-9)), or using the shutdown API, it will log the fact that its stopping.
Obviously, nodes can't start by themselves unless they are being run by
another demon. Are you using something like that? The service wrapper maybe?

On Wed, May 30, 2012 at 2:53 PM, Andrej Rosenheinrich <
andrej.rosenheinrich@unister.de> wrote:

Hi all,

we noticed a very strange behavior of our cluster last night (16
nodes, all running 0.19.4). At a certain time one cluster node
obviously restarted (at least I would interpret the log message
"[2012-05-29 23:33:44,090][INFO ][node ] [Star-
Lord] {0.19.4}[23335]: starting ... " this way). Within the next half
hour another 5 nodes restarted, some of them several (up to 4) times.
In the logs we couldnt find anything like stopping or stopped, what is
the usual output when stopping a node.

My questions:

  • is it possible to start a node that wasnt stopped before?
  • what can actually start an elasticsearch node? At this time we didnt
    use the index, so no requests were sent from our side. Can
    elasticsearch itself restart nodes and if so, what triggers this?
  • finally, after restarting no indices were found. Any ideas on this
    one?

Shay, if you are interested in more information I can send you all
logs. The beer in Berlin is on me :wink:

Thanks!
Andrej


(Andrej) #3

Shay, your hint was absolutely the right one, reading the logs from
the service wrapper revealed that the wrapper decided to restart the
JVM because it did not responded the expected way. So at least we have
an explanation, I can sleep well again :wink:

It was nice to talk to you in Berlin, thanks for your informations and
your interesting talk. Cant wait to try out 0.19.5 :wink:

Greets,
Andrej


(system) #4