When using Elastic APM java agent, the JVM crashes unexpectedly.
Tried using different AWS EC2 instances, happens on AMD and Graviton CPUs.
Happens with JVMs with small (4GB) and large (60GB) heaps.
Kibana version: 7.6.2
Elasticsearch version: 7.6.2
Java: OpenJDK Runtime Environment Corretto-17.0.5.8.1 64it (x86 and Aarch)
APM Agent language and version: Java 1.35.0
OS: Amazon Linux 2 (x86 and Aarch)
I've attached 3 JVM crash files below as replies. I have about 15 more.
Unfortunately I cannot reproduce the issue on demand, to me it appears to happen randomly.
Thanks for the error reports. Do you have any correlation between stacks (or top of stack) and CPU architecture? Do you have any more info on when it tends to happen (how far in to the application run) and how often it happens (eg 10% of runs?). Thanks
It is usually between 30min and 3 hours after jvm startup. It appears to happen when we are using the app. So it happens during daytime (not at night) and more often if more users are using the app. It usually happens when doing an AWS S3 upload or download using the software.amazon.awssdk s3 (version 2.19.2).
It happens on Aarch (Graviton 2) and X86 64 bit (AMD Epyc) cpu architectures.
This issue did not happen with the java apm agent version 1.20.0 on Java 11.0.16.1.
thanks, the larger set of crash logs has a couple of crashes that show intercepts of the Elastic methods happening from AWS interceptors. Are you explicitly adding interceptors, or is that something Amazon does automatically (maybe eg for XRay tracking of errors)?
Could you try with this 1.35 snapshot please, it is just the latest build with the throwables no longer captured for those paths (PR). We suspect this is an interplay with how coretto specializes Throwable handling on AWS machines. If this stops the crashes here, we'll look at how best to apply it more generically
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.