ES 0.20.4 cluster can't start

Today I upgraded from 0.19.3 to 0.20.3.
I also upgraded from jdk 1.6.0_14 to 1.7.0_13.
I did full cluster shutdown, replaced ES, then started the cluster.

While starting, some nodes (not only one) restarted by the wrapper.
Here's the wrapper log.
STATUS | wrapper | 2013/02/07 18:55:51 | JVM received a signal UNKNOWN (6).
STATUS | wrapper | 2013/02/07 18:55:51 | JVM process is gone.
ERROR | wrapper | 2013/02/07 18:55:51 | JVM exited unexpectedly.
STATUS | wrapper | 2013/02/07 18:55:55 | Launching a JVM...
INFO | jvm 2 | 2013/02/07 18:55:56 | WrapperManager: Initializing...

Once this happens, the number_of_nodes in /_cluster/health increases.
I have 19 nodes, then it goes 18 when a node is restarting, then goes to 20 (not 19).

elasticsearch/bin/service/elasticsearch.conf has jvm config like this:
wrapper.java.additional.1=-Delasticsearch-service
wrapper.java.additional.2=-Des.path.home=%ES_HOME%
wrapper.java.additional.3=-Xss256k
wrapper.java.additional.4=-XX:+UseG1GC
wrapper.java.additional.5=-XX:+HeapDumpOnOutOfMemoryError
wrapper.java.additional.6=-Djava.awt.headless=true
wrapper.java.additional.7=-XX:PermSize=256m
wrapper.java.additional.8=-XX:MaxPermSize=256m
wrapper.java.additional.9=-Dcom.sun.management.jmxremote
wrapper.java.additional.10=-Dcom.sun.management.jmxremote.port=3333
wrapper.java.additional.11=-Dcom.sun.management.jmxremote.authenticate=false
wrapper.java.additional.12=-Dcom.sun.management.jmxremote.ssl=false

Two questions:

  • what are the possible reason the wrapper says - JVM received a signal UNKNOWN (6)?
  • what does having more number_of_nodes in _cluster/health indicate?

Thanks in advance.

Sorry, one correction. I upgraded to 0.20.4.

Can you provide the debug log

Thanks
Vineeth

On Fri, Feb 8, 2013 at 8:52 AM, arta artasano@sbcglobal.net wrote:

Sorry, one correction. I upgraded to 0.20.4.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/ES-0-20-4-cluster-can-t-start-tp4029492p4029493.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I found crash dump? files, so I gist'ed


Hope it helps.

ES log around the restart time (matches with the first crash dump time) is gist'ed here:

Try removing the usage of G1 garbage collector, based on reports we get, its not stable yet. Prefer to stick with the default settings that come out of the box.

Also, if you can get away from using JMX, and using ES stats APIs, it would be even better :slight_smile:

On Feb 8, 2013, at 6:36 AM, arta artasano@sbcglobal.net wrote:

ES log around the restart time (matches with the first crash dump time) is
gist'ed here:
gist:4736880 · GitHub

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/ES-0-20-4-cluster-can-t-start-tp4029492p4029496.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you Kimchy.
I will definitely try that.