Nodes randomly restarting

I noticed lately that my ES node names keep changing. I happened to see it
happen live today (via elasticsearch paramedic -
http://karmi.github.io/elasticsearch-paramedic/), and saw the ES node
actually shut down and restart. I logged into the machine to check the
logs, and there were no abnormal entries - the log just showed normal ES
startup messages, no exceptions from the previous session and no indication
that it had even shutdown.

Any idea what could cause this? This seems to happen a few times a day. So
far they are fairly infrequent and staggered, so it hasn't affected uptime,
but it's a little disconcerting.

I am running a 5-node cluster with ES 0.20.6 on Ubuntu using a direct
memory index, and use the service wrapper to start ElasticSearch (
https://github.com/elasticsearch/elasticsearch-servicewrapper).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Have you checked the wrapper log? The wrapper process pings the elastic
nodes periodically and will restart them if they fail to respond. I saw
these problems early on and changed my default wrapper timeout from 300
seconds to 10 minutes, if I remember correctly.

Randy

On Tue, Jun 25, 2013 at 3:38 PM, Jeremy Jongsma jeremy@jongsma.org wrote:

I noticed lately that my ES node names keep changing. I happened to see it
happen live today (via elasticsearch paramedic -
http://karmi.github.io/elasticsearch-paramedic/), and saw the ES node
actually shut down and restart. I logged into the machine to check the
logs, and there were no abnormal entries - the log just showed normal ES
startup messages, no exceptions from the previous session and no indication
that it had even shutdown.

Any idea what could cause this? This seems to happen a few times a day. So
far they are fairly infrequent and staggered, so it hasn't affected uptime,
but it's a little disconcerting.

I am running a 5-node cluster with ES 0.20.6 on Ubuntu using a direct
memory index, and use the service wrapper to start ElasticSearch (
https://github.com/elasticsearch/elasticsearch-servicewrapper).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ah, didn't see the separate service.log before.

As it turns out, it was getting hit by the kernel OOM killer. What's the
best way to determine the max memory I need? Right now my instances are
running with 7.5GB of RAM, I have a direct memory limit of 5GB set in the
JVM properties, with 1GB heap space, leaving 1.5GB for general OS services.
It seemed to me like that should be enough, but I'm still seeing these
random oom-kills.

On Tue, Jun 25, 2013 at 7:56 PM, Randall McRee randall.mcree@gmail.comwrote:

Have you checked the wrapper log? The wrapper process pings the elastic
nodes periodically and will restart them if they fail to respond. I saw
these problems early on and changed my default wrapper timeout from 300
seconds to 10 minutes, if I remember correctly.

Randy

On Tue, Jun 25, 2013 at 3:38 PM, Jeremy Jongsma jeremy@jongsma.orgwrote:

I noticed lately that my ES node names keep changing. I happened to see
it happen live today (via elasticsearch paramedic -
http://karmi.github.io/elasticsearch-paramedic/), and saw the ES node
actually shut down and restart. I logged into the machine to check the
logs, and there were no abnormal entries - the log just showed normal ES
startup messages, no exceptions from the previous session and no indication
that it had even shutdown.

Any idea what could cause this? This seems to happen a few times a day.
So far they are fairly infrequent and staggered, so it hasn't affected
uptime, but it's a little disconcerting.

I am running a 5-node cluster with ES 0.20.6 on Ubuntu using a direct
memory index, and use the service wrapper to start ElasticSearch (
https://github.com/elasticsearch/elasticsearch-servicewrapper).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yTyOghRK61M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.