JVM crash with JRE 7 and profiling_inferred_spans_enabled

Charles_Porter · February 19, 2021, 2:41pm

Hello. We're currently testing out Elastic APM in our java-based test environment for an application my team supports. One of the application components runs in a Java 7 environment. We initially encountered issue 1583, and applied the suggested -XX:CompileCommand to work around the issue.

However, whenever we set profiling_inferred_spans_enabled to true, shortly afterward, the JVM crashes. We are interested in being able to collect the extra span information provided profiling_inferred_spans_enabled, and are hoping there is a way to work through this. Thank you.

OS Version: CentOS 6.10
Java version: Oracle JRE 7.0_80-b15
APM Agent version: apm-java-agent 1.21.0

The apm-agent attaches to the apache tomcat process via the -javaagent flag. Default settings.

elasticapm.properties:
use_path_as_transaction_name=true
server_urls=https://xxx.com/apm
verify_server_sert=false
api_key=xxx

top of the hs_err_pid log:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ffa751cb08b, pid=13985, tid=140712926508800
#
# JRE version: Java(TM) SE Runtime Environment (7.0_80-b15) (build 1.7.0_80-b15)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x6ec08b]  JvmtiEnvBase::get_stack_trace(JavaThread*, int, int, _jvmtiFrameInfo*, int*)+0x21b
#
# Core dump written. Default location: /apps/tomcat8/sbtool/bin/core or core.13985
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x00007ffa7042f800):  JavaThread "Unknown thread" [_thread_blocked, id=14101, stack(0x00007ffa47eff000,0x00007ffa48000000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000f60115

Registers:
RAX=0x0000000000f600f5, RBX=0x00007ffa7012ade0, RCX=0x0000003e2b8182a0, RDX=0x00007ffa7010f4f8
RSP=0x00007ffa47ff8c80, RBP=0x00007ffa47ff9300, RSI=0x00007ffa7010f4f0, RDI=0x0000000000000000
R8 =0x0000000000000000, R9 =0x0000000000000800, R10=0x0000000000000000, R11=0x0000000000000246
R12=0x00000000f4dd59a8, R13=0x0000000002782240, R14=0x00007ffa7010f4f0, R15=0x00007ffa58664d38
RIP=0x00007ffa751cb08b, EFLAGS=0x0000000000010246, CSGSFS=0x0000000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e

Sylvain_Juge · February 19, 2021, 3:16pm

Hi @Charles_Porter ,

I am sorry for the inconvenience, and thanks for reporting this issue.
Do you think you could send us the full crash report (make sure that any sensitive environment variable or JVM command line parameter is removed) ?

Also, what is the frequency of the crashes that you observed ?

From the past JVM crashes related to JvmtiEnvBase::get_stack_trace we managed to work-around those stability issues on some JVMs by using the async_profiler_safe_mode=63 configuration parameter.

This (yet un-documented) configuration allows to make async-profiler avoid collecting some stack traces for extra safety. Please try with it and tell us if it makes any difference. Depending on the result, some trial & error might be required to properly tune the value and identify what is causing this within async-profiler.

Also, it's completely unrelated but there is a small typo in the config you have written here, verify_server_sert should probably be replaced by verify_server_cert.

Charles_Porter · February 19, 2021, 7:02pm

Hi Sylvain_Juge,

Thank you for responding. Using async_profiler_safe_mode=63 seems to have improved things substantially. No JVM crashes, yet.

I've submitted the full hs_err_pid.log through my Elastic Sales contact.

Also, thanks for catching the typo. My eyes are clearly not what the used to be.

Thank you.

Eyal_Koren · February 21, 2021, 11:22am

Thanks for the update @Charles_Porter

When you get the chance, please try out this bugfix without the async_profiler_safe_mode setting and see if the problem is resolved. This snapshot contains the proposed fix for Async Profiler.

Thanks!

Charles_Porter · February 22, 2021, 7:02pm

Thank you, @Eyal_Koren

The bugfix appears to be successful. After installing, I have disabled async_profiler_safe_mode and restarted the application. Simulated workloads have not produced any JVM crashes.

Thank you very much for helping.

Eyal_Koren · February 23, 2021, 7:02am

Awesome! Thanks for reporting back!

I am also running load tests during the past ~48 hours with that fix on a setup that previously reproduced the issue you reported, with quite intensive load on async profiler (very frequent sampling) and it looks good.

For now, you can continue using this snapshot. We will make sure to include this fix in our next release.

system · March 16, 2021, 3:03am

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
APM Java agent read elasticapm.properties, resulting in application startup failure APM java	2	20	February 17, 2025
JVM crashed after upgrade to apm java agent 1.19 APM java	17	1357	January 18, 2021
APM agent for Java is causing system crashes APM	9	1248	March 8, 2019
JVM corrupted with java agent 1.19.0 - CentOS APM java	7	694	December 27, 2020
JVM Crash with APM APM java	20	1395	November 8, 2020

JVM crash with JRE 7 and profiling_inferred_spans_enabled

Related topics