APM profiler stopping with https://discuss.elastic.co/c/apm on latest adoptopenjdk/openjdk8-openj9:alpine-slim

Kibana version: 7.4.1

Elasticsearch version: 7.4.1

APM Server version: 7.4.1

APM Agent language and version: Java 1.20

Browser version: n/a

Original install method (e.g. download page, yum, deb, from source, etc.) and version: Jar from maven source

Fresh install or upgraded from other version? Fresh

Is there anything special in your setup? Running adoptopenjdk/openjdk8-openj9:alpine-slim

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

Steps to reproduce:

  1. Use adoptopenjdk/openjdk8-openj9:alpine-slim as base image
  2. Run java application with -javagent for APM as usual

Provide logs and/or server output (if relevant):

JAVA_OPTS = -Xgcpolicy:gencon -Xgc:concurrentScavenge -Dnetworkaddress.cache.ttl=60 -Xshareclasses:nonFatal,silent -server  -Xmx3072m -XX:MaxMetaspaceSize=256m -Xss1m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9001 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dhazelcast.diagnostics.enabled=false -Dlog_json=true -Dhazelcast.health.monitoring.level=OFF -javaagent:/apm/elastic-apm-agent.jar -Delastic.apm.service_name=svc-accounting-provider-internal -Delastic.apm.server_urls=http://apm-server.logging.svc.cluster.local:8200 -Delastic.apm.application_packages=lithium -Delastic.apm.enable_log_correlation=true
java -Xgcpolicy:gencon -Xgc:concurrentScavenge -Dnetworkaddress.cache.ttl=60 -Xshareclasses:nonFatal,silent -server -Xmx3072m -XX:MaxMetaspaceSize=256m -Xss1m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9001 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dhazelcast.diagnostics.enabled=false -Dlog_json=true -Dhazelcast.health.monitoring.level=OFF -javaagent:/apm/elastic-apm-agent.jar -Delastic.apm.service_name=svc-accounting-provider-internal -Delastic.apm.server_urls=http://apm-server.logging.svc.cluster.local:8200 -Delastic.apm.application_packages=lithium -Delastic.apm.enable_log_correlation=true -jar service-accounting-provider-internal.jar
2021-05-19 14:56:12,778 [main] INFO  co.elastic.apm.agent.util.JmxUtils - Found JVM-specific OperatingSystemMXBean interface: com.ibm.lang.management.OperatingSystemMXBean
2021-05-19 14:56:12,916 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - Starting Elastic APM 1.20.0 as svc-accounting-provider-internal on Java 1.8.0_292 Runtime version: 1.8.0_292-b10 VM version: openj9-0.26.0 (AdoptOpenJDK) Linux 4.19.167+
2021-05-19 14:56:12,916 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - VM Arguments: [-Xoptionsfile=/opt/java/openjdk/jre/lib/amd64/default/options.default, -Xlockword:mode=default,noLockword=java/lang/String,noLockword=java/util/MapEntry,noLockword=java/util/HashMap$Entry,noLockword=org/apache/harmony/luni/util/ModifiedMap$Entry,noLockword=java/util/Hashtable$Entry,noLockword=java/lang/invoke/MethodType,noLockword=java/lang/invoke/MethodHandle,noLockword=java/lang/invoke/CollectHandle,noLockword=java/lang/invoke/ConstructorHandle,noLockword=java/lang/invoke/ConvertHandle,noLockword=java/lang/invoke/ArgumentConversionHandle,noLockword=java/lang/invoke/AsTypeHandle,noLockword=java/lang/invoke/ExplicitCastHandle,noLockword=java/lang/invoke/FilterReturnHandle,noLockword=java/lang/invoke/DirectHandle,noLockword=java/lang/invoke/ReceiverBoundHandle,noLockword=java/lang/invoke/DynamicInvokerHandle,noLockword=java/lang/invoke/FieldHandle,noLockword=java/lang/invoke/FieldGetterHandle,noLockword=java/lang/invoke/FieldSetterHandle,noLockword=java/lang/invoke/StaticFieldGetterHandle,noLockword=java/lang/invoke/StaticFieldSetterHandle,noLockword=java/lang/invoke/IndirectHandle,noLockword=java/lang/invoke/InterfaceHandle,noLockword=java/lang/invoke/VirtualHandle,noLockword=java/lang/invoke/PrimitiveHandle,noLockword=java/lang/invoke/InvokeExactHandle,noLockword=java/lang/invoke/InvokeGenericHandle,noLockword=java/lang/invoke/VarargsCollectorHandle,noLockword=java/lang/invoke/ThunkTuple, -Xjcl:jclse29, -Dcom.ibm.oti.vm.bootstrap.library.path=/opt/java/openjdk/jre/lib/amd64/default:/opt/java/openjdk/jre/lib/amd64, -Dsun.boot.library.path=/opt/java/openjdk/jre/lib/amd64/default:/opt/java/openjdk/jre/lib/amd64, -Djava.library.path=/opt/java/openjdk/jre/lib/amd64/default:/opt/java/openjdk/jre/lib/amd64:/usr/lib64:/usr/lib, -Djava.home=/opt/java/openjdk/jre, -Djava.ext.dirs=/opt/java/openjdk/jre/lib/ext, -Duser.dir=/, -XX:+IgnoreUnrecognizedVMOptions, -XX:+IdleTuningGcOnIdle, -Xshareclasses:name=openj9_system_scc,cacheDir=/opt/java/.scc,readonly,nonFatal, -Djava.class.path=., -Xgcpolicy:gencon, -Xgc:concurrentScavenge, -Dnetworkaddress.cache.ttl=60, -Xshareclasses:nonFatal,silent, -server, -Xmx3072m, -XX:MaxMetaspaceSize=256m, -Xss1m, -Dcom.sun.management.jmxremote, -Dcom.sun.management.jmxremote.port=9001, -Dcom.sun.management.jmxremote.ssl=false, -Dcom.sun.management.jmxremote.authenticate=false, -Dhazelcast.diagnostics.enabled=false, -Dlog_json=true, -Dhazelcast.health.monitoring.level=OFF, -javaagent:/apm/elastic-apm-agent.jar, -Delastic.apm.service_name=svc-accounting-provider-internal, -Delastic.apm.server_urls=http://apm-server.logging.svc.cluster.local:8200, -Delastic.ap.application_packages=lithium, -Delastic.apm.enable_log_correlation=true, -Djava.class.path=service-accounting-provider-internal.jar, -Dsun.java.command=service-accounting-provider-internal.jar, -Dsun.java.launcher=SUN_STANDARD, -Dsun.java.launcher.pid=6]
2021-05-19 14:56:14,604 [main] INFO  co.elastic.apm.agent.impl.ElasticApmTracer - Tracer switched to RUNNING state
2021-05-19 14:56:14,750 [elastic-apm-server-healthcheck] INFO  co.elastic.apm.agent.report.ApmServerHealthChecker - Elastic APM server is available: {  "build_date": "2019-10-22T15:25:16Z",  "build_sha": "26c6b9c55d4753495d8aa571a21b8a97f9693b4b",  "version": "7.4.1"}
2021-05-19 14:57:11,438 [main] INFO  co.elastic.apm.agent.servlet.ServletVersionInstrumentation - Servlet container info = Apache Tomcat/8.5.15
2021-05-19 14:57:19,840 [elastic-apm-sampling-profiler] ERROR co.elastic.apm.agent.profiler.SamplingProfiler - Stopping profiler
        at co.elastic.apm.agent.profiler.asyncprofiler.BufferedFile.get(BufferedFile.java:182) ~[?:?]
        at co.elastic.apm.agent.profiler.asyncprofiler.JfrParser.parse(JfrParser.java:113) ~[?:?]
        at co.elastic.apm.agent.profiler.SamplingProfiler.processTraces(SamplingProfiler.java:496) ~[?:?]
        at co.elastic.apm.agent.profiler.SamplingProfiler.profile(SamplingProfiler.java:411) ~[?:?]
        at co.elastic.apm.agent.profiler.SamplingProfiler.run(SamplingProfiler.java:370) [?:?]

Changes that broke the profiler:

  1. Moving to OpenJ9 version of the same JDK version
  2. Enabling gencon concurrent scavenge GC

Moving back to a non OpenJ9 version fixes the issue.

Welcome to our forum @Kim_Attree!
Seems like an error with the ByteBuffer implementation, but we can't see with this partial stack trace. Can you provide the entire stack trace, including inner causes, and everything else that may be relevant in log?

Hi Eyal, thank you for your support, here is the full stack trace I get:

2021-05-25 12:29:31,397 [elastic-apm-sampling-profiler] ERROR co.elastic.apm.agent.profiler.SamplingProfiler - Stopping profil
er
java.nio.BufferUnderflowException: null
at java.nio.Buffer.nextGetIndex(Unknown Source) ~[?:1.8.0_292]
at java.nio.DirectByteBuffer.get(Unknown Source) ~[?:1.8.0_292]
at co.elastic.apm.agent.profiler.asyncprofiler.BufferedFile.get(BufferedFile.java:182) ~[?:?]
at co.elastic.apm.agent.profiler.asyncprofiler.JfrParser.parse(JfrParser.java:113) ~[?:?]
at co.elastic.apm.agent.profiler.SamplingProfiler.processTraces(SamplingProfiler.java:496) ~[?:?]
at co.elastic.apm.agent.profiler.SamplingProfiler.profile(SamplingProfiler.java:411) ~[?:?]
at co.elastic.apm.agent.profiler.SamplingProfiler.run(SamplingProfiler.java:370) [?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:1.8.0_292]
at java.util.concurrent.FutureTask.run(Unknown Source) [?:1.8.0_292]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) [?:1.8.0_292]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:1.8.0_292]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_292]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_292]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_292]

This is as far as it goes in my logs, unless there is another way to get a more detailed stack trace ?

Thanks,

K

Hmm, it looks as if even the magic bytes are not available to read from the file...
Please try to set log_level to debug and find the debug message printed just before this error is printed. It should tell us the size of the JFR file.

@felixbarny do you have any idea? Can this be an error with async profiler, or expected and we only need to adjust the co.elastic.apm.agent.profiler.asyncprofiler.BufferedFile#ensureRemaining() check?

Seems like the file is empty and there's a bug in ensureRemaining that it assumes that if the whole file is in the buffer, there's always enough to read. Which is not true if the file is empty.

We should probably short circuit the parsing if the file is empty. But all of that doesn't fix the underlying issue.

The question is why the JFR file is empty. It appears that async-profiler is not compatible with that particular container or configuration. Not sure if it's because of OpenJ9 (JDK8), the alpine-slim container, or the gencon concurrent scavenge GC.

Very well then, I'll add this check and try to followup with the async profiler author to see if this is a known issue

@felixbarny I can confirm we normally use OpenJDK8 Alpine-slim and that works without issue. I also tried other garbage collection techniques such as balanced and optthroughput with openJ9 and found the same behaviour from APM Agent.

I will follow @Eyal_Koren advice and enable debug to get some more information on this.

Thanks for your help this far.

Great discussions, and I have faced the same problem.

I have faced the same problem.

Thanks for reporting @niyazahamed!
Same JVM/OS? If not, please specify which.

I have enabled DEBUG logging and attached the output here: 2021-05-26 15:46:43,363 [main] DEBUG co.elastic.apm.agent.util.ExecutorUtils - A - Pastebin.com

Yep, that verifies the issue - Parsing /tmp/apm-traces-7850495507829281497.jfr (0 bytes)
Apparently OpenJ9 is not supported. We need to document it and avoid this error.
Thanks for the feedback!

I think its because OpenJ9 does not have JFR:

bash-5.0# jcmd 6 JFR.start duration=1s filename=/tmp/test.jfr
Error: Error in command
Command JFR.start,duration=1s,filename=/tmp/test.jfr not recognized

Just got the explanation from the async profiler author: apparently, OpenJ9 didn't have AsyncGetCallTrace until recently. This is the native API that async profiler relies on.
He just saw that it was recently implemented.
In theory, this opens a way for having async-profiler on OpenJ9, too. But there are still other HotSpot-specific dependencies that need to be fixed.

@Eyal_Koren do you think GraalVM would work with APM Agent ?