JVM crash originating from APM agent

svenrienstra · June 30, 2023, 5:46am

We've been experiencing a SIGSEV JVM crash on our production cluster which seems to originate from the APM Java agent. We haven't been able to establish a pattern of when this happens or what triggers this. The second instance of this error happened after a week, the third one after it had been running for 2 days.

We're running on a eclipse-temurin:17-alpine docker image with APM agent version 1.36.

The error we see is:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f269de9b474, pid=1, tid=600
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.7+7 (17.0.7+7) (build 17.0.7+7)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.7+7 (17.0.7+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xe34474]  ObjectSynchronizer::FastHashCode(Thread*, oopDesc*)+0x184

Stack: [0x00007f263dfe8000,0x00007f263e0e8aa8],  sp=0x00007f263e0e4e10,  free space=1011k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xe34474]  ObjectSynchronizer::FastHashCode(Thread*, oopDesc*)+0x184
V  [libjvm.so+0x90432e]  JVM_IHashCode+0x9e
J 32297  java.lang.System.identityHashCode(Ljava/lang/Object;)I java.base@17.0.7 (0 bytes) @ 0x00007f268f78859a [0x00007f268f7884a0+0x00000000000000fa]
J 63004 c2 com.blogspot.mydailyjava.weaklockfree.AbstractWeakConcurrentMap.containsKey(Ljava/lang/Object;)Z (46 bytes) @ 0x00007f268f203b14 [0x00007f268f203520+0x00000000000005f4]
J 117851 c2 co.elastic.apm.agent.loginstr.reformatting.AbstractEcsReformattingHelper.onAppendEnter(Ljava/lang/Object;)Z (124 bytes) @ 0x00007f26945aa698 [0x00007f26945aa580+0x0000000000000118]
j  co.elastic.apm.agent.jul.reformatting.JulConsoleHandlerPublishAdvice.initializeReformatting(Ljava/util/logging/ConsoleHandler;)Z+4
J 121708 c2 java.util.logging.Logger.log(Ljava/util/logging/LogRecord;)V java.logging@17.0.7 (153 bytes) @ 0x00007f2694acacdc [0x00007f2694acab40+0x000000000000019c]
J 96765 c1 java.util.logging.Logger.doLog(Ljava/util/logging/LogRecord;)V java.logging@17.0.7 (50 bytes) @ 0x00007f268a2ce814 [0x00007f268a2ce560+0x00000000000002b4]

I've also openend an issue with the JVM maintainers (SIGSEV on ObjectSynchronizer::FastHashCode · Issue #831 · adoptium/adoptium-support · GitHub) which also contains the full log.

For the mean time we've disabled log-reformatting instrumentation.

Sylvain_Juge · June 30, 2023, 8:15am

Hi @svenrienstra , thanks for reporting this !

I think you did the right thing by opening an issue on the JVM as the problem seems to be within JVM code.
We can see from the stack trace that the Elastic APM agent calls java.lang.System.identityHashCode , which is implemented by the JVM in native code.

There are a few things that aren't visible in the crash report though, could you please provide more details on:

how does the agent is setup ? we don't see the -javaagent in the JVM arguments
can you provide the agent configuration (without sensitive values) ?
are you using the sampling profiler ?
what is the frequency of those crashes ?

Thanks in advance !

svenrienstra · June 30, 2023, 8:50am

Hi @Sylvain_Juge ,

Answers to your questions:

We attach programatically using ElasticApmAttacher.attach();
This is our config

2023-06-29 21:37:56,849 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - Starting Elastic APM 1.36.0 as ** (23.12.0) on Java 17.0.7 Runtime version: 17.0.7+7 VM version: 17.0.7+7 (Eclipse Adoptium) Linux 5.10.162+
23:37:56.850
2023-06-29 21:37:56,850 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - service_name: '**' (source: /configuration/elasticapm.properties)
23:37:56.850
2023-06-29 21:37:56,850 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - environment: 'NL-PROD' (source: /configuration/elasticapm.properties)
23:37:56.851
2023-06-29 21:37:56,850 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - disable_instrumentations: 'log-reformatting' (source: /configuration/elasticapm.properties)
23:37:56.851
2023-06-29 21:37:56,851 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - config_file: '/configuration/elasticapm.properties' (source: Environment Variables)
23:37:56.851
2023-06-29 21:37:56,851 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - plugins_dir: '/' (source: /configuration/elasticapm.properties)
23:37:56.851
2023-06-29 21:37:56,851 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - secret_token: 'XXXX' (source: /configuration/elasticapm.properties)
23:37:56.851
2023-06-29 21:37:56,851 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - server_urls: '**' (source: /configuration/elasticapm.properties)
23:37:56.851
2023-06-29 21:37:56,851 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - application_packages: 'com.h4h' (source: /configuration/elasticapm.properties)

We do not use the sampling profiler
So far we've had 3 of these crashes, between the first and second one there was a week in between. Between the second and third there were two days in between.

Sylvain_Juge · June 30, 2023, 9:15am

Thanks for the details !

As you are not using the sampling profiler, there is no native code involved by the Java agent that could interfere with the JVM code, so here we need to wait for any feedback from the JVM team.

I would suggest collecting and keeping all the crash reports when it happens, as if there is any correlation between those it would be very valuable to have multiple instances.

svenrienstra · July 5, 2023, 11:15am

We've just had another crash on our production environment because of this issue. The weird thing is it's still happening in JulConsoleHandlerPublishAdvice even though we've disabled the log-reformatting instrumentation. If there is anyway to fully disable this JulConsoleHandlerPublishAdvice it would be very much appreciated, otherwise we might need to disable the whole agent for now.

Stack trace is very similar to last time:

Current thread (0x00007f3723a7e390):  JavaThread "http-nio-8880-exec-4" daemon [_thread_in_vm, id=601, stack(0x00007f37073a1000,0x00007f37074a1a90)]

Stack: [0x00007f37073a1000,0x00007f37074a1a90],  sp=0x00007f370749de80,  free space=1011k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xe34474]  ObjectSynchronizer::FastHashCode(Thread*, oopDesc*)+0x184
V  [libjvm.so+0x90432e]  JVM_IHashCode+0x9e
J 32350  java.lang.System.identityHashCode(Ljava/lang/Object;)I java.base@17.0.7 (0 bytes) @ 0x00007f375891301a [0x00007f3758912f20+0x00000000000000fa]
J 58401 c2 com.blogspot.mydailyjava.weaklockfree.AbstractWeakConcurrentMap.containsKey(Ljava/lang/Object;)Z (46 bytes) @ 0x00007f3758ac6b60 [0x00007f3758ac6360+0x0000000000000800]
J 114608 c2 co.elastic.apm.agent.loginstr.reformatting.AbstractEcsReformattingHelper.onAppendEnter(Ljava/lang/Object;)Z (124 bytes) @ 0x00007f375ca48018 [0x00007f375ca47f00+0x0000000000000118]
j  co.elastic.apm.agent.jul.reformatting.JulConsoleHandlerPublishAdvice.initializeReformatting(Ljava/util/logging/ConsoleHandler;)Z+4
J 122764 c2 java.util.logging.Logger.doLog(Ljava/util/logging/LogRecord;)V java.logging@17.0.7 (50 bytes) @ 0x00007f375a3f17d8 [0x00007f375a3f1180+0x0000000000000658]
J 65111 c2 java.util.logging.Logger.log(Ljava/util/logging/Level;Ljava/lang/String;)V java.logging@17.0.7 (25 bytes) @ 0x00007f37596cff48 [0x00007f37596cfd40+0x0000000000000208]
J 103756 c1 com.openhtmltopdf.util.XRLogger.log(Lcom/openhtmltopdf/util/Diagnostic;)V (61 bytes) @ 0x00007f3754316e6c [0x00007f3754316240+0x0000000000000c2c]
J 91463 c2 com.openhtmltopdf.util.XRLog.log(Ljava/util/logging/Level;Lcom/openhtmltopdf/util/LogMessageId;Z[Ljava/lang/Object;)V (57 bytes) @ 0x00007f375b53c788 [0x00007f375b53c040+0x0000000000000748]
J 113820 c1 com.openhtmltopdf.util.XRLog.log(Ljava/util/logging/Level;Lcom/openhtmltopdf/util/LogMessageId$LogMessageId1Param;Ljava/lang/Object;)V (15 bytes) @ 0x00007f3751939d0c [0x00007f3751939ac0+0x000000000000024c]
j  com.openhtmltopdf.resource.XMLResource.newXMLReader()Lorg/xml/sax/XMLReader;+179
j  com.openhtmltopdf.resource.XMLResource$XMLResourceBuilder.createXMLResource(Lcom/openhtmltopdf/resource/XMLResource;)Lcom/openhtmltopdf/resource/XMLResource;+16
j  com.openhtmltopdf.resource.XMLResource$XMLResourceBuilder.access$100(Lcom/openhtmltopdf/resource/XMLResource$XMLResourceBuilder;Lcom/openhtmltopdf/resource/XMLResource;)Lcom/openhtmltopdf/resource/XMLResource;+2
j  com.openhtmltopdf.resource.XMLResource.load(Lorg/xml/sax/InputSource;)Lcom/openhtmltopdf/resource/XMLResource;+11
j  com.openhtmltopdf.pdfboxout.PdfBoxRenderer.setDocumentFromStringP(Ljava/lang/String;Ljava/lang/String;)V+24
j  com.openhtmltopdf.pdfboxout.PdfBoxRenderer.<init>(Lcom/openhtmltopdf/outputdevice/helper/BaseDocument;Lcom/openhtmltopdf/outputdevice/helper/UnicodeImplementation;Lcom/openhtmltopdf/outputdevice/helper/

Config after application restarted again (should be the same as when the crash happenend):

12:31:03.523
2023-07-05 10:31:03,519 [Attach Listener] INFO  co.elastic.apm.agent.util.JmxUtils - Found JVM-specific OperatingSystemMXBean interface: com.sun.management.OperatingSystemMXBean
12:31:03.543
2023-07-05 10:31:03,543 [Attach Listener] INFO  co.elastic.apm.agent.util.JmxUtils - Found JVM-specific ThreadMXBean interface: com.sun.management.ThreadMXBean
12:31:03.565
2023-07-05 10:31:03,565 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - Starting Elastic APM 1.36.0 as ** (23.12.0) on Java 17.0.7 Runtime version: 17.0.7+7 VM version: 17.0.7+7 (Eclipse Adoptium) Linux 5.10.162+
12:31:03.566
2023-07-05 10:31:03,566 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - environment: 'NL-PROD' (source: /configuration/elasticapm.properties)
12:31:03.566
2023-07-05 10:31:03,566 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - disable_instrumentations: 'log-reformatting' (source: /configuration/elasticapm.properties)
12:31:03.566
2023-07-05 10:31:03,566 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - service_name: '**' (source: /configuration/elasticapm.properties)
12:31:03.567
2023-07-05 10:31:03,567 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - config_file: '/configuration/elasticapm.properties' (source: Environment Variables)
12:31:03.567
2023-07-05 10:31:03,567 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - plugins_dir: '/' (source: /configuration/elasticapm.properties)
12:31:03.567
2023-07-05 10:31:03,567 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - secret_token: 'XXXX' (source: /configuration/elasticapm.properties)
12:31:03.567
2023-07-05 10:31:03,567 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - server_urls: '**' (source: /configuration/elasticapm.properties)
12:31:03.568
2023-07-05 10:31:03,568 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - application_packages: 'com.h4h' (source: /configuration/elasticapm.properties)

svenrienstra · July 5, 2023, 11:57am

It seems like that for some reason disabling log-reformatting group does not work. Disabling the whole logging group does result in Not applying excluded instrumentation co.elastic.apm.agent.jul.reformatting.JulLogReformattingInstrumentation$ConsoleReformattingInstrumentation

Sylvain_Juge · July 5, 2023, 12:24pm

Hi,

I think the inconsistency related to disabling this instrumentation has been removed in version 1.37.0 with set service name & version in ecs-logging by SylvainJuge · Pull Request #3064 · elastic/apm-agent-java · GitHub (you are using 1.36.0), so here the best option should probably be to upgrade to the latest version (1.39.0 as I'm writing this). I haven't checked the exact version you are using here but updating to the latest is usually a good idea.

However, updating to the latest version should not change the problem you have with crashing JVMs, but at least you should be able to apply a better work-around.

svenrienstra · July 5, 2023, 12:58pm

Hi @Sylvain_Juge,

Can confirm that indeed with a newer version log-reformatting works and with 1.36 jul-ecs works. It might be a good idea (for the future) to add some note to the documentation if something like this changes. I see it's listed as breaking change in the changelog for 1.37, but looking at the documentation for 1.x (Core configuration options | APM Java Agent Reference [1.x] | Elastic) there is no indication whatsoever the naming of these groups is different between minor versions.

system · August 2, 2023, 12:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
APM agent for Java is causing system crashes APM	9	1242	March 8, 2019
JVM crashed after upgrade to apm java agent 1.19 APM java	17	1323	January 18, 2021
SIGSEGV JVM Crash APM java	19	1480	February 25, 2023
SIGSEGV JVM Crash with ZGC Garbage collection Elastic Observability	4	1001	October 12, 2023
Java APM agent crashes JVM Corretto 17 on AWS EB Linux 2023 Elastic Observability	0	192	October 27, 2023

JVM crash originating from APM agent

Related topics