JVM crash originating from APM agent

We've been experiencing a SIGSEV JVM crash on our production cluster which seems to originate from the APM Java agent. We haven't been able to establish a pattern of when this happens or what triggers this. The second instance of this error happened after a week, the third one after it had been running for 2 days.

We're running on a eclipse-temurin:17-alpine docker image with APM agent version 1.36.

The error we see is:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f269de9b474, pid=1, tid=600
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.7+7 (17.0.7+7) (build 17.0.7+7)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.7+7 (17.0.7+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xe34474]  ObjectSynchronizer::FastHashCode(Thread*, oopDesc*)+0x184
Stack: [0x00007f263dfe8000,0x00007f263e0e8aa8],  sp=0x00007f263e0e4e10,  free space=1011k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xe34474]  ObjectSynchronizer::FastHashCode(Thread*, oopDesc*)+0x184
V  [libjvm.so+0x90432e]  JVM_IHashCode+0x9e
J 32297  java.lang.System.identityHashCode(Ljava/lang/Object;)I java.base@17.0.7 (0 bytes) @ 0x00007f268f78859a [0x00007f268f7884a0+0x00000000000000fa]
J 63004 c2 com.blogspot.mydailyjava.weaklockfree.AbstractWeakConcurrentMap.containsKey(Ljava/lang/Object;)Z (46 bytes) @ 0x00007f268f203b14 [0x00007f268f203520+0x00000000000005f4]
J 117851 c2 co.elastic.apm.agent.loginstr.reformatting.AbstractEcsReformattingHelper.onAppendEnter(Ljava/lang/Object;)Z (124 bytes) @ 0x00007f26945aa698 [0x00007f26945aa580+0x0000000000000118]
j  co.elastic.apm.agent.jul.reformatting.JulConsoleHandlerPublishAdvice.initializeReformatting(Ljava/util/logging/ConsoleHandler;)Z+4
J 121708 c2 java.util.logging.Logger.log(Ljava/util/logging/LogRecord;)V java.logging@17.0.7 (153 bytes) @ 0x00007f2694acacdc [0x00007f2694acab40+0x000000000000019c]
J 96765 c1 java.util.logging.Logger.doLog(Ljava/util/logging/LogRecord;)V java.logging@17.0.7 (50 bytes) @ 0x00007f268a2ce814 [0x00007f268a2ce560+0x00000000000002b4]

I've also openend an issue with the JVM maintainers (SIGSEV on ObjectSynchronizer::FastHashCode · Issue #831 · adoptium/adoptium-support · GitHub) which also contains the full log.

For the mean time we've disabled log-reformatting instrumentation.

Hi @svenrienstra , thanks for reporting this !

I think you did the right thing by opening an issue on the JVM as the problem seems to be within JVM code.
We can see from the stack trace that the Elastic APM agent calls java.lang.System.identityHashCode , which is implemented by the JVM in native code.

There are a few things that aren't visible in the crash report though, could you please provide more details on:

  • how does the agent is setup ? we don't see the -javaagent in the JVM arguments
  • can you provide the agent configuration (without sensitive values) ?
  • are you using the sampling profiler ?
  • what is the frequency of those crashes ?

Thanks in advance !

Hi @Sylvain_Juge ,

Answers to your questions:

  • We attach programatically using ElasticApmAttacher.attach();
  • This is our config
2023-06-29 21:37:56,849 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - Starting Elastic APM 1.36.0 as ** (23.12.0) on Java 17.0.7 Runtime version: 17.0.7+7 VM version: 17.0.7+7 (Eclipse Adoptium) Linux 5.10.162+
23:37:56.850
2023-06-29 21:37:56,850 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - service_name: '**' (source: /configuration/elasticapm.properties)
23:37:56.850
2023-06-29 21:37:56,850 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - environment: 'NL-PROD' (source: /configuration/elasticapm.properties)
23:37:56.851
2023-06-29 21:37:56,850 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - disable_instrumentations: 'log-reformatting' (source: /configuration/elasticapm.properties)
23:37:56.851
2023-06-29 21:37:56,851 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - config_file: '/configuration/elasticapm.properties' (source: Environment Variables)
23:37:56.851
2023-06-29 21:37:56,851 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - plugins_dir: '/' (source: /configuration/elasticapm.properties)
23:37:56.851
2023-06-29 21:37:56,851 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - secret_token: 'XXXX' (source: /configuration/elasticapm.properties)
23:37:56.851
2023-06-29 21:37:56,851 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - server_urls: '**' (source: /configuration/elasticapm.properties)
23:37:56.851
2023-06-29 21:37:56,851 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - application_packages: 'com.h4h' (source: /configuration/elasticapm.properties)
  • We do not use the sampling profiler
  • So far we've had 3 of these crashes, between the first and second one there was a week in between. Between the second and third there were two days in between.

Thanks for the details !

As you are not using the sampling profiler, there is no native code involved by the Java agent that could interfere with the JVM code, so here we need to wait for any feedback from the JVM team.

I would suggest collecting and keeping all the crash reports when it happens, as if there is any correlation between those it would be very valuable to have multiple instances.

We've just had another crash on our production environment because of this issue. The weird thing is it's still happening in JulConsoleHandlerPublishAdvice even though we've disabled the log-reformatting instrumentation. If there is anyway to fully disable this JulConsoleHandlerPublishAdvice it would be very much appreciated, otherwise we might need to disable the whole agent for now.

Stack trace is very similar to last time:

Current thread (0x00007f3723a7e390):  JavaThread "http-nio-8880-exec-4" daemon [_thread_in_vm, id=601, stack(0x00007f37073a1000,0x00007f37074a1a90)]

Stack: [0x00007f37073a1000,0x00007f37074a1a90],  sp=0x00007f370749de80,  free space=1011k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xe34474]  ObjectSynchronizer::FastHashCode(Thread*, oopDesc*)+0x184
V  [libjvm.so+0x90432e]  JVM_IHashCode+0x9e
J 32350  java.lang.System.identityHashCode(Ljava/lang/Object;)I java.base@17.0.7 (0 bytes) @ 0x00007f375891301a [0x00007f3758912f20+0x00000000000000fa]
J 58401 c2 com.blogspot.mydailyjava.weaklockfree.AbstractWeakConcurrentMap.containsKey(Ljava/lang/Object;)Z (46 bytes) @ 0x00007f3758ac6b60 [0x00007f3758ac6360+0x0000000000000800]
J 114608 c2 co.elastic.apm.agent.loginstr.reformatting.AbstractEcsReformattingHelper.onAppendEnter(Ljava/lang/Object;)Z (124 bytes) @ 0x00007f375ca48018 [0x00007f375ca47f00+0x0000000000000118]
j  co.elastic.apm.agent.jul.reformatting.JulConsoleHandlerPublishAdvice.initializeReformatting(Ljava/util/logging/ConsoleHandler;)Z+4
J 122764 c2 java.util.logging.Logger.doLog(Ljava/util/logging/LogRecord;)V java.logging@17.0.7 (50 bytes) @ 0x00007f375a3f17d8 [0x00007f375a3f1180+0x0000000000000658]
J 65111 c2 java.util.logging.Logger.log(Ljava/util/logging/Level;Ljava/lang/String;)V java.logging@17.0.7 (25 bytes) @ 0x00007f37596cff48 [0x00007f37596cfd40+0x0000000000000208]
J 103756 c1 com.openhtmltopdf.util.XRLogger.log(Lcom/openhtmltopdf/util/Diagnostic;)V (61 bytes) @ 0x00007f3754316e6c [0x00007f3754316240+0x0000000000000c2c]
J 91463 c2 com.openhtmltopdf.util.XRLog.log(Ljava/util/logging/Level;Lcom/openhtmltopdf/util/LogMessageId;Z[Ljava/lang/Object;)V (57 bytes) @ 0x00007f375b53c788 [0x00007f375b53c040+0x0000000000000748]
J 113820 c1 com.openhtmltopdf.util.XRLog.log(Ljava/util/logging/Level;Lcom/openhtmltopdf/util/LogMessageId$LogMessageId1Param;Ljava/lang/Object;)V (15 bytes) @ 0x00007f3751939d0c [0x00007f3751939ac0+0x000000000000024c]
j  com.openhtmltopdf.resource.XMLResource.newXMLReader()Lorg/xml/sax/XMLReader;+179
j  com.openhtmltopdf.resource.XMLResource$XMLResourceBuilder.createXMLResource(Lcom/openhtmltopdf/resource/XMLResource;)Lcom/openhtmltopdf/resource/XMLResource;+16
j  com.openhtmltopdf.resource.XMLResource$XMLResourceBuilder.access$100(Lcom/openhtmltopdf/resource/XMLResource$XMLResourceBuilder;Lcom/openhtmltopdf/resource/XMLResource;)Lcom/openhtmltopdf/resource/XMLResource;+2
j  com.openhtmltopdf.resource.XMLResource.load(Lorg/xml/sax/InputSource;)Lcom/openhtmltopdf/resource/XMLResource;+11
j  com.openhtmltopdf.pdfboxout.PdfBoxRenderer.setDocumentFromStringP(Ljava/lang/String;Ljava/lang/String;)V+24
j  com.openhtmltopdf.pdfboxout.PdfBoxRenderer.<init>(Lcom/openhtmltopdf/outputdevice/helper/BaseDocument;Lcom/openhtmltopdf/outputdevice/helper/UnicodeImplementation;Lcom/openhtmltopdf/outputdevice/helper/

Config after application restarted again (should be the same as when the crash happenend):

12:31:03.523
2023-07-05 10:31:03,519 [Attach Listener] INFO  co.elastic.apm.agent.util.JmxUtils - Found JVM-specific OperatingSystemMXBean interface: com.sun.management.OperatingSystemMXBean
12:31:03.543
2023-07-05 10:31:03,543 [Attach Listener] INFO  co.elastic.apm.agent.util.JmxUtils - Found JVM-specific ThreadMXBean interface: com.sun.management.ThreadMXBean
12:31:03.565
2023-07-05 10:31:03,565 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - Starting Elastic APM 1.36.0 as ** (23.12.0) on Java 17.0.7 Runtime version: 17.0.7+7 VM version: 17.0.7+7 (Eclipse Adoptium) Linux 5.10.162+
12:31:03.566
2023-07-05 10:31:03,566 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - environment: 'NL-PROD' (source: /configuration/elasticapm.properties)
12:31:03.566
2023-07-05 10:31:03,566 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - disable_instrumentations: 'log-reformatting' (source: /configuration/elasticapm.properties)
12:31:03.566
2023-07-05 10:31:03,566 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - service_name: '**' (source: /configuration/elasticapm.properties)
12:31:03.567
2023-07-05 10:31:03,567 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - config_file: '/configuration/elasticapm.properties' (source: Environment Variables)
12:31:03.567
2023-07-05 10:31:03,567 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - plugins_dir: '/' (source: /configuration/elasticapm.properties)
12:31:03.567
2023-07-05 10:31:03,567 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - secret_token: 'XXXX' (source: /configuration/elasticapm.properties)
12:31:03.567
2023-07-05 10:31:03,567 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - server_urls: '**' (source: /configuration/elasticapm.properties)
12:31:03.568
2023-07-05 10:31:03,568 [Attach Listener] INFO  co.elastic.apm.agent.configuration.StartupInfo - application_packages: 'com.h4h' (source: /configuration/elasticapm.properties)

It seems like that for some reason disabling log-reformatting group does not work. Disabling the whole logging group does result in Not applying excluded instrumentation co.elastic.apm.agent.jul.reformatting.JulLogReformattingInstrumentation$ConsoleReformattingInstrumentation

Hi,

I think the inconsistency related to disabling this instrumentation has been removed in version 1.37.0 with set service name & version in ecs-logging by SylvainJuge · Pull Request #3064 · elastic/apm-agent-java · GitHub (you are using 1.36.0), so here the best option should probably be to upgrade to the latest version (1.39.0 as I'm writing this). I haven't checked the exact version you are using here but updating to the latest is usually a good idea.

However, updating to the latest version should not change the problem you have with crashing JVMs, but at least you should be able to apply a better work-around.

Hi @Sylvain_Juge,

Can confirm that indeed with a newer version log-reformatting works and with 1.36 jul-ecs works. It might be a good idea (for the future) to add some note to the documentation if something like this changes. I see it's listed as breaking change in the changelog for 1.37, but looking at the documentation for 1.x (Core configuration options | APM Java Agent Reference [1.x] | Elastic) there is no indication whatsoever the naming of these groups is different between minor versions.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.