ES 6.4.3 docker container keep crash with error code 139

Hi Everyone,
I met a problem with ES v6.4.3 official docker container running on CentOS 7.6 and searchguard plugin.the docker container can start and working,but it will crash with error code 139,which cause we lost some of the nodes.
I perfomed some search against the docker container error code 139 ,it seems that the java program inside the container is crashed and generated a core-dump.

Does anyone can help on this?

Thanks a lot!

It's most probably a memory issue. Have you tried to increase the memory available to the Docker container? and then maybe the heap for the JVM inside the Docker container as well?

our current configuration for an instance is about 31G memory and I thinks this is enouth as we do not have much workload so far

The following is the current configuration in docker-compose.yml:
elasticsearch-instance1:
privileged: true
image: docker.elastic.co/elasticsearch/elasticsearch:6.4.3
container_name: elasticsearch-instance1
environment:
- cluster.name=njmonitor_cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms31g -Xmx31g"
- xpack.monitoring.collection.enabled=true
ulimits:
memlock:
soft: -1
hard: -1

Indeed, that seems like enough heap.
Are there any logs from ES that you can share?

There is a core file and a hs_err_pid1.log
-rwxrwxrwx 1 root root 37001433088 Jan 15 10:59 core.1
-rwxrwxrwx 1 root root 195996 Jan 15 11:00 hs_err_pid1.log

some information in the he hs_err_pid1.log:

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007fea1ca2af91, pid=1, tid=694

JRE version: OpenJDK Runtime Environment (10.0.2+13) (build 10.0.2+13)

Java VM: OpenJDK 64-Bit Server VM (10.0.2+13, mixed mode, tiered, compressed oops, concurrent mark sweep gc, linux-amd64)

Problematic frame:

J 19317 c2 org.apache.lucene.util.MergedIterator.pullTop()V (135 bytes) @ 0x00007fea1ca2af91 [0x00007fea1ca2a560+0x0000000000000a31]

Core dump will be written. Default location: /usr/share/elasticsearch/core.1

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

--------------- S U M M A R Y ------------

Command Line: -Xms31g -Xmx31g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -Xss1m -Dj
ava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimizati
on=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMem
oryError -Des.cgroups.hierarchy.override=/ -Xms31g -Xmx31g -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/usr/share/elasticsearch/config -Des.distri
bution.flavor=default -Des.distribution.type=tar org.elasticsearch.bootstrap.Elasticsearch -Expack.monitoring.collection.enabled=true -Ecluster.name=njmammot
h -Ebootstrap.memory_lock=true

Host: Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz, 56 cores, 125G, CentOS Linux release 7.5.1804 (Core)
Time: Tue Jan 15 01:07:59 2019 UTC elapsed time: 3216 seconds (0d 0h 53m 36s)

--------------- T H R E A D ---------------

Current thread (0x00007fe3dc001800): JavaThread "elasticsearch[172.18.140.8-instance2][[.monitoring-es-6-2019.01.15][0]: Lucene Merge Thread #45]" daemon [_
thread_in_Java, id=694, stack(0x00007fe68cdfb000,0x00007fe68cefc000)]

Stack: [0x00007fe68cdfb000,0x00007fe68cefc000], sp=0x00007fe68cefa1d8, free space=1020k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 19317 c2 org.apache.lucene.util.MergedIterator.pullTop()V (135 bytes) @ 0x00007fea1ca2af91 [0x00007fea1ca2a560+0x0000000000000a31]

[error occurred during error reporting (printing native stack), id 0xb]

siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x00000010001fffc6

--------------- S Y S T E M ---------------

OS:CentOS Linux release 7.5.1804 (Core)
uname:Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64
libc:glibc 2.17 NPTL 2.17
rlimit: STACK 8192k, CORE infinity, NPROC infinity, NOFILE 1048576, AS infinity, DATA infinity, FSIZE infinity
load average:0.07 0.05 0.05

/proc/meminfo:
MemTotal: 131328224 kB
MemFree: 55848904 kB
MemAvailable: 56911444 kB
Buffers: 2104 kB
Cached: 1693076 kB
SwapCached: 0 kB
Active: 5026956 kB
Inactive: 798452 kB
Active(anon): 4305656 kB
Inactive(anon): 10412 kB
Active(file): 721300 kB
Inactive(file): 788040 kB
Unevictable: 67609536 kB
Mlocked: 82416968 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 11048 kB
Writeback: 0 kB
AnonPages: 71740812 kB
Mapped: 255768 kB
Shmem: 10980 kB
Slab: 378204 kB
SReclaimable: 124872 kB
SUnreclaim: 253332 kB
KernelStack: 29616 kB
PageTables: 149840 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 65664112 kB
Committed_AS: 72773404 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 636292 kB
VmallocChunk: 34291798012 kB
HardwareCorrupted: 0 kB
AnonHugePages: 69806080 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 339808 kB
DirectMap2M: 5476352 kB
DirectMap1G: 130023424 kB

container (cgroup) information:
container_type: cgroupv1
cpu_cpuset_cpus: 0-55
cpu_memory_nodes: 0-1
active_processor_count: 56
cpu_quota: -2
cpu_period: -2
cpu_shares: -2
memory_limit_in_bytes: -1
memory_and_swap_limit_in_bytes: -1
memory_soft_limit_in_bytes: -1
memory_usage_in_bytes: 36826869760
memory_max_usage_in_bytes: 36830556160

CPU:total 56 (initial active 56) (14 cores per cpu, 2 threads per core) family 6 model 85 stepping 4, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, s
se4.2, popcnt, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, evex, fma
CPU Model and flags from /proc/cpuinfo:
model name : Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2
ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cd
p_l3 intel_ppin intel_pt ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx
rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local d
therm ida arat pln pts hwp_epp pku ospke spec_ctrl intel_stibp flush_l1d

Memory: 4k page, physical 131328224k(55848904k free), swap 0k(0k free)

vm_info: OpenJDK 64-Bit Server VM (10.0.2+13) for linux-amd64 JRE (10.0.2+13), built on Jun 27 2018 17:52:12 by "mach5one" with gcc 4.9.2

The docker image using CentOS 7.5 version,do We need to use the same version for the host ,now the host is CentOS 7.6

Hmm I see you're running Java 10 inside the container and there's an OpenJDK bug in that version.

You're not the first to experience this. There are a few things to try in this thread (namely use -XX:UseAVX=2): https://github.com/elastic/elasticsearch/issues/31425#issuecomment-402522285

Can you check ?

1 Like

Thanks a lot! I will test the option mentioned and feedback the result!

After applied the JVM option,the cluster is stable now.Thanks a lot for your kind support!

1 Like

Fantastic, glad it was helpful!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.