Elaticsearch 2.1.2 crashing on OpenJDK (1.7.0_79-b14)


(Piyush Mathur) #1

Hi,
I am trying to run elasticsearch 2.1.2 on OpenJDK (1.7.0_79-b14) but it keeps crashing due to SIGSEGV , can anyone please suggest what could be done are their any know issue with these version of OpenJDK and elasticsearch. Content of hs_err_pid and VM arguments are.

hs_err_pid file

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007f34fdecee0d, pid=22389, tid=139864896005888

JRE version: OpenJDK Runtime Environment (7.0_79-b14) (build 1.7.0_79-b14)

Java VM: OpenJDK 64-Bit Server VM (24.79-b02 mixed mode linux-amd64 compressed oops)

Derivative: IcedTea 2.5.6

Distribution: Ubuntu 14.04 LTS, package 7u79-2.5.6-0ubuntu1.14.04.1

Problematic frame:

V [libjvm.so+0x469e0d] Par_MarkFromRootsClosure::scan_oops_in_oop(HeapWord*)+0x19d

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

If you would like to submit a bug report, please include

instructions on how to reproduce the bug and visit:

http://icedtea.classpath.org/bugzilla

--------------- T H R E A D ---------------

Current thread (0x00007f34f80fe800): GCTaskThread [stack: 0x00007f34d5616000,0x00007f34d5717000] [id=22410]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x000000046ef20d10

Registers:
RAX=0x000000046ef20d00, RBX=0x00007f34d5715d10, RCX=0x0000000000000003, RDX=0x00007f34fe828420
RSP=0x00007f34d5715b60, RBP=0x00007f34d5715c20, RSI=0x00007f34f81023c0, RDI=0x000000046ef20d10
R8 =0x00000006a9cf4378, R9 =0x0000000000000002, R10=0x00000007f8a00078, R11=0x000000061f390000
R12=0x00007f34fe8445c3, R13=0x00007f34d5715b80, R14=0x00007f34fe800a90, R15=0x000000003b2e4a24
RIP=0x00007f34fdecee0d, EFLAGS=0x0000000000010206, CSGSFS=0x0000000000000033, ERR=0x0000000000000004
TRAPNO=0x000000000000000e

Top of Stack: (sp=0x00007f34d5715b60)
0x00007f34d5715b60: 01b2d06e00000299 00007f34fe367de6
0x00007f34d5715b70: 01b2d06f00000299 00007f34fe367de6
0x00007f34d5715b80: 00007f34fe800a90 00007f34fe367e00
0x00007f34d5715b90: 00007f34f8137aa0 00007f34f80fd950

Register to memory mapping:

RAX=0x000000046ef20d00 is an unknown value
RBX=0x00007f34d5715d10 is an unknown value
RCX=0x0000000000000003 is an unknown value
RDX=0x00007f34fe828420: <offset 0xdc3420> in /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/libjvm.so at 0x00007f34fda65000
RSP=0x00007f34d5715b60 is an unknown value
RBP=0x00007f34d5715c20 is an unknown value
RSI=0x00007f34f81023c0 is an unknown value
RDI=0x000000046ef20d10 is an unknown value
R8 =0x00000006a9cf4378 is an unknown value
R9 =0x0000000000000002 is an unknown value
R10=0x00000007f8a00078 is an oop
[B

  • klass: {type array byte}
  • length: 3
    R11=0x000000061f390000 is an oop
    [B
  • klass: {type array byte}
  • length: 262168
    R12=0x00007f34fe8445c3: <offset 0xddf5c3> in /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/libjvm.so at 0x00007f34fda65000
    R13=0x00007f34d5715b80 is an unknown value
    R14=0x00007f34fe800a90: <offset 0xd9ba90> in /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/libjvm.so at 0x00007f34fda65000
    R15=0x000000003b2e4a24 is an unknown value

VM Arguments:

jvm_args: -Xms8g -Xmx8g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInit
iatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/home/ubuntu/e
lasticsearch-2.1.2


(Daniel Mitterdorfer) #2

Hi @pmathur,

pretty hard to tell what is going on, especially without symbol files. I just see that the garbage collector trips when it is looking for GC roots. This is most likely a JDK bug but this is hard to verify. You can try to add -XX:-UseCompressedOops and check whether the problem disappears but this setting will reduce the performance of Elasticsearch. A better option is to upgrade the JDK.

Daniel


(Piyush Mathur) #3

@danielmitterdorfer thanks for your response we tried -XX:-UseCompressedOops
option and switched JDK to Oracle 7.0_76-b13 from open JDK 1.7.0_79-b14 but we are still facing the same issue. Our whole app environment is working on java 7 thus we are hesitant in upgrading JDK beyond java 7 for elasticsearch instance. Please let us know if anything can be done about this.

hs_error log file :

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007fa8a0766978, pid=25313, tid=140352628578048

JRE version: Java(TM) SE Runtime Environment (7.0_76-b13) (build 1.7.0_76-b13)

Java VM: Java HotSpot(TM) 64-Bit Server VM (24.76-b04 mixed mode linux-amd64 )

Problematic frame:

V [libjvm.so+0x7ec978] nmethod::can_unload(BoolObjectClosure*, OopClosure*, oopDesc**, bool)+0x78

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

--------------- T H R E A D ---------------

Current thread (0x00007fa898146800): VMThread [stack: 0x00007fa66481d000,0x00007fa66491e000] [id=25337]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x000000003c8aaa76

Registers:
RAX=0x000000003c8aa906, RBX=0x00007fa7457f5620, RCX=0x00007fa89619fdbd, RDX=0x00007fa6eeda5358
RSP=0x00007fa66491c140, RBP=0x00007fa66491c170, RSI=0x00007fa7457f5620, RDI=0x00007fa6eeda5368
R8 =0x0000000000000000, R9 =0x000000000119ddc0, R10=0x00000000000000a5, R11=0x0000000000000246
R12=0x00007fa8a0dc2940, R13=0x00007fa89619fdbd, R14=0x00007fa89619eb10, R15=0x00007fa8a0dc2950
RIP=0x00007fa8a0766978, EFLAGS=0x0000000000010246, CSGSFS=0x0000000000000033, ERR=0x0000000000000004
TRAPNO=0x000000000000000e


(Daniel Mitterdorfer) #4

Hi @pmathur,

From the JVM's error log we see that the segment violation occurred in a different stack frame. Previously it was Par_MarkFromRootsClosure::scan_oops_in_oop(HeapWord*), now it's nmethod::can_unload(BoolObjectClosure*, OopClosure*, oopDesc**, bool).

There is actually a related OpenJDK bug which refers to nmethod::can_unload but it is closed with "not an issue" there as it seems to be ok on Java 8. So I fear you are out of luck here.

I understand that you are reluctant to upgrade to Java 8 in your case. I assume you run Elasticsearch as a dedicated process, so although you are reluctant to upgrade your application stack to Java 8 (totally understandable), one option that you could consider is to run Elasticsearch on Java 8 and leave the rest of your application stack on Java 7.

Daniel


(Piyush Mathur) #5

hey @danielmitterdorfer as we use elasticsearch java driver in detail capacity we fear that we may face compatibility issues running two different major versions on elasticsearch client and cluster please refer here , Kindly let us know if anything that could be done.

Thanks.
Piyush Mathur


(Daniel Mitterdorfer) #6

Hi @pmathur,

ok, let me state all (realistic) options I can think of:

  • Do nothing with the risk of occasional(?) JVM crashes. As I stated before there is already an issue in the OpenJDK repo and given that it's occurring only in Java 7 (which is EOL since April 2015), Java 8 is out for several years and the bug is closed there is practically zero chance that it will get fixed in Java 7.
  • Tweak JVM configuration so you don't experience this problem. Currently the segment violation occurs when unloading an nmethod which in turn is unloaded during code cache unloading. So while you could maybe tweak your JVM configuration to prevent that, there is no guarantee (and it requires very intimate knowledge of the JVM and extensive testing to ensure that it does not have unwanted side effects.
  • Upgrade only Elasticsearch to Java 8 and keep the application at Java 7. The trade-offs are stated in the discussion that you've linked to.
  • Upgrade only Elasticsearch to Java 8 and use the new Elasticsearch REST client which can be used with ES 2.x and is also Java 7 compatible but only provides a low-level interface at the moment. This may be a good option if you only use a few API methods but may prove expensive if you use a lot of API methods (as you need to reimplement them).

None of these options is perfect and each of them has specific trade-offs which you need to consider.

Daniel


(system) #7