SIGSEGV JVM Crash with ZGC Garbage collection

Kibana version: 7.17.5

Elasticsearch version: 7.17.5

Java: OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, sharing)

APM Agent language and version: Java 1.37.0

OS: Alpine Linux v3.17

Running on a google managed kubernetes cluster, v1.24.12-gke.500 - we have APM on about 85 different microservices, only seems to be affecting this microservice/set of pods.

I cant reproduce it, happens randomly, but usually withing 24 hours of a new pod rolling out. I have coredumps and full error files for 3 crashes so far, they are pretty big (up to 9Gb)

---------------  T H R E A D  ---------------

Current thread (0x00007f74cf32c380):  GCTaskThread "ZWorker#1" [stack: 0x00007f74cf126000,0x00007f74cf226aa8] [id=14]

Stack: [0x00007f74cf126000,0x00007f74cf226aa8],  sp=0x00007f74cf2205e0,  free space=1001k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  []  ZBarrier::mark_barrier_on_oop_slow_path(unsigned long)+0x8b
V  []  void OopOopIterateDispatch<ZMarkBarrierOopClosure<false> >::Table::oop_oop_iterate<InstanceKlass, oopDesc*>(ZMarkBarrierOopClosure<false>*, oopDesc*, Klass*)+0x93
V  []  ZMark::follow_object(oopDesc*, bool)+0xb0
V  []  ZMark::work_without_timeout(ZMarkCache*, ZMarkStripe*, ZMarkThreadLocalStacks*)+0xcf
V  []  ZMark::work(unsigned long)+0x8f
V  []  ZTask::GangTask::work(unsigned int)+0x1c
V  []  GangWorker::loop()+0x5f
V  []
V  []  Thread::call_run()+0xc0
V  []  thread_native_entry(Thread*)+0x131

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000004

Register to memory mapping:

RAX=0x0000000000000064 is an unknown value
RBX=0x00000374d93b52e8 is an unknown value
RCX=0x00007f74e75a1068: <offset 0x0000000001370068> in /opt/java/openjdk/lib/server/ at 0x00007f74e6231000
RDX=0x00007f74e7549140: <offset 0x0000000001318140> in /opt/java/openjdk/lib/server/ at 0x00007f74e6231000
RSP=0x00007f74cf2205e0 points into unknown readable memory: 0x0000040081e7cd58 | 58 cd e7 81 00 04 00 00
RBP=0x00007f74cf220610 points into unknown readable memory: 0x00007f74cf220660 | 60 06 22 cf 74 7f 00 00
RSI=0x000008007edd8c20 is a good oop: java.lang.String 
{0x000008007edd8c20} - klass: 'java/lang/String'
 - string: "co.elastic.apm.exception"
RDI=0x00007f74d93b52e8 is at entry_point+2728 in (nmethod*)0x00007f74d93b4190
R8 =0x00007f74de5924d8 points into unknown readable memory: 0x0000000000000003 | 03 00 00 00 00 00 00 00
R9 =0x00007f74cf220670 points into unknown readable memory: 0x00007f74e7476900 | 00 69 47 e7 74 7f 00 00
R10=0x0 is NULL
R11=0x00007f74cf340880 points into unknown readable memory: 0xffffffff0003af9a | 9a af 03 00 ff ff ff ff
R12=0x00000b74d93b52e8 is an unknown value
R13=0x0 is NULL
R14=0x00007f74cf340440 points into unknown readable memory: 0x00007f74e7476478 | 78 64 47 e7 74 7f 00 00
R15=0x000008002225d6f8 points into unknown readable memory: 0x00007f74d93b52e8 | e8 52 3b d9 74 7f 00 00
1 Like