Elasticsearch crashed suddenly with 7.1.1

Elasticsearch version

bin/elasticsearch --version
7.1.1

Plugins installed

bin/elasticsearch-plugin list
analysis-ik 

JVM version

java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)

OS version

uname -a
Linux log-es05.com 2.6.32-642.6.2.el6.x86_64 #1 SMP Wed Oct 26 06:52:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Related config

node.master: false
node.data: true
node.ingest: true
node.ml: false
xpack.ml.enabled: true

issue

the cluster of my elasticsearch has runned for more than 2week , yesterday , one of the node crashed suddenly.
there are some info i can supply

1. the time of my es crashed

is near 2019-07-21 23:40:00

2. the hs_err_pid%p.log

this file give a lot of info , i give some i think important here


# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f197d41866e, pid=22536, tid=139739997894400
#
# JRE version: Java(TM) SE Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J 5754  sun.nio.ch.EPollArrayWrapper.epollWait(JIJI)I (0 bytes) @ 0x00007f197d41866e [0x00007f197d418580+0xee]
#
# Core dump written. Default location: /home/deploy/search/elasticsearch-7.1.1/core or core.22536

---------------  T H R E A D  ---------------

Current thread (0x00007f183000a800):  JavaThread "elasticsearch[ESV14][transport_worker][T#13]" daemon [_thread_in_native_trans, id=22779, stack(0x00007f17c0df7000,0x00007f17c0ef8000)]

siginfo: si_signo: 11 (SIGSEGV), si_code: 0 (SI_USER)

Registers:
RAX=0x0000000000000001, RBX=0x00007f17c0ef6540, RCX=0x0000000000000a80, 
...

Top of Stack: (sp=0x00007f17c0ef6470)
0x00007f17c0ef6470:   00000000c58c9a90 00007f197f02fc30
...
0x00007f17c0ef6660:  

Instructions: (pc=0x00007f197d41866e)
...

Register to memory mapping:

RAX=0x0000000000000001 is an unknown value
...
R13=0x0000000000000007 is an unknown value
R14=0x00000000c816bf50 is an oop
java.lang.Object
 - klass: 'java/lang/Object'
R15=0x00007f183000a800 is a thread


Stack: [0x00007f17c0df7000,0x00007f17c0ef8000],  sp=0x00007f17c0ef6470,  free space=1021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 5754  sun.nio.ch.EPollArrayWrapper.epollWait(JIJI)I (0 bytes) @ 0x00007f197d41866e [0x00007f197d418580+0xee]
J 10602 C2 sun.nio.ch.EPollArrayWrapper.poll(J)I (70 bytes) @ 0x00007f197e4ff6ac [0x00007f197e4ff5c0+0xec]
J 29708 C2 sun.nio.ch.EPollSelectorImpl.doSelect(J)I (124 bytes) @ 0x00007f1981dfc6dc [0x00007f1981dfc480+0x25c]
J 9858 C2 sun.nio.ch.SelectorImpl.select(J)I (34 bytes) @ 0x00007f197eac8984 [0x00007f197eac8800+0x184]
J 39902 C2 io.netty.channel.nio.NioEventLoop.select(Z)V (307 bytes) @ 0x00007f197e341248 [0x00007f197e3410e0+0x168]
J 24033% C2 io.netty.channel.nio.NioEventLoop.run()V (236 bytes) @ 0x00007f197fc13014 [0x00007f197fc12e80+0x194]
j  io.netty.util.concurrent.SingleThreadEventExecutor$5.run()V+44
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub
V  [libjvm.so+0x68dbc6]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1056
V  [libjvm.so+0x68e0d1]  JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321
V  [libjvm.so+0x68e567]  JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)+0x47
V  [libjvm.so+0x7254b0]  thread_entry(JavaThread*, Thread*)+0xa0
V  [libjvm.so+0xa6b77f]  JavaThread::thread_main_inner()+0xdf
V  [libjvm.so+0xa6b8ac]  JavaThread::run()+0x11c
V  [libjvm.so+0x91ef78]  java_start(Thread*)+0x108
C  [libpthread.so.0+0x7aa1]  start_thread+0xd1

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 5754  sun.nio.ch.EPollArrayWrapper.epollWait(JIJI)I (0 bytes) @ 0x00007f197d4185c8 [0x00007f197d418580+0x48]
J 10602 C2 sun.nio.ch.EPollArrayWrapper.poll(J)I (70 bytes) @ 0x00007f197e4ff6ac [0x00007f197e4ff5c0+0xec]
J 29708 C2 sun.nio.ch.EPollSelectorImpl.doSelect(J)I (124 bytes) @ 0x00007f1981dfc6dc [0x00007f1981dfc480+0x25c]
J 9858 C2 sun.nio.ch.SelectorImpl.select(J)I (34 bytes) @ 0x00007f197eac8984 [0x00007f197eac8800+0x184]
J 39902 C2 io.netty.channel.nio.NioEventLoop.select(Z)V (307 bytes) @ 0x00007f197e341248 [0x00007f197e3410e0+0x168]
J 24033% C2 io.netty.channel.nio.NioEventLoop.run()V (236 bytes) @ 0x00007f197fc13014 [0x00007f197fc12e80+0x194]
j  io.netty.util.concurrent.SingleThreadEventExecutor$5.run()V+44
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub

....



3. self monitor

there are also some basic moitor for the mathine ,the picture below show them.

3.1 one cpu core`s idle is 0 during that time

image

3.2 the same core shows high iowait

image

4. core file

there is no jvm heapdump file create (may it means no out of memory happened)
but a core file find ,but it seems that the core file is truncted, i try to read it with gdb
and got the below
the New Thread line is 273 lines

$ gdb /opt/soft/jdk1.8.0_91/bin/java  core.22536
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-92.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Reading symbols from /opt/soft/jdk1.8.0_91/bin/java...Missing separate debuginfo for /opt/soft/jdk1.8.0_91/bin/java
Try: yum --enablerepo='*-debug*' install /usr/lib/debug/.build-id/bd/74b7294ebbdd93e9ef3b729e5aab228a3f681b.debug
(no debugging symbols found)...done.
BFD: Warning: /data/temp/core.22536 is truncated: expected core file size >= 36030541824, found: 28792045568.
[New Thread 22779]
[New Thread 22780]
......
[New Thread 22544]
[New Thread 22798]
[New Thread 22564]
Cannot access memory at address 0x7f1993668168
Cannot access memory at address 0x7f1993668168
Cannot access memory at address 0x7f1993668168
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Failed to read a valid object file image from memory.
Core was generated by `/usr/local/jdk1.8.0_91/bin/java -Xms30g -Xmx30g -XX:+UseG1GC -XX:MaxGCPauseMill'.
Program terminated with signal 6, Aborted.
#0  0x00007f1992aae5e5 in ?? ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.209.el6_9.2.x86_64

sorry for boring you , but i have really no way to deal it .
after the crash , i restart the elasticsearch , and it works well till now .

A SIGSEGV is normally a JVM bug and ...

... this JVM version is unsupported. The minimum supported JVM version is 1.8.0u111, and you should upgrade to at least that.

1 Like

thank you so much:pray::pray: , you reply me so quickly !
i will try what you said .:grin: