SIGSEGV JVM Crash

mafaul · January 10, 2023, 9:27am

When using Elastic APM java agent, the JVM crashes unexpectedly.
Tried using different AWS EC2 instances, happens on AMD and Graviton CPUs.
Happens with JVMs with small (4GB) and large (60GB) heaps.

Kibana version: 7.6.2

Elasticsearch version: 7.6.2

Java: OpenJDK Runtime Environment Corretto-17.0.5.8.1 64it (x86 and Aarch)

APM Agent language and version: Java 1.35.0

OS: Amazon Linux 2 (x86 and Aarch)

I've attached 3 JVM crash files below as replies. I have about 15 more.

Unfortunately I cannot reproduce the issue on demand, to me it appears to happen randomly.

It look a bit like Segmentation fault when attaching apm-agent-java to adopt jdk 11 · Issue #864 · elastic/apm-agent-java · GitHub, but with Java 17

Jack_Shirazi · January 10, 2023, 11:42am

Thanks for the error reports. Do you have any correlation between stacks (or top of stack) and CPU architecture? Do you have any more info on when it tends to happen (how far in to the application run) and how often it happens (eg 10% of runs?). Thanks

Jack_Shirazi · January 10, 2023, 12:22pm

If you can attach all the crash logs, we can do that analysis. This looks like it's not going to be easy to figure out

mafaul · January 10, 2023, 12:24pm

Thank you for having a look.

It is usually between 30min and 3 hours after jvm startup. It appears to happen when we are using the app. So it happens during daytime (not at night) and more often if more users are using the app. It usually happens when doing an AWS S3 upload or download using the software.amazon.awssdk s3 (version 2.19.2).

It happens on Aarch (Graviton 2) and X86 64 bit (AMD Epyc) cpu architectures.

This issue did not happen with the java apm agent version 1.20.0 on Java 11.0.16.1.

Crash logs: hs_err - Google Drive

Jack_Shirazi · January 10, 2023, 12:39pm

Have you also run without the Elastic agent on the config that crashes (coretto 17) and found it stable?

mafaul · January 10, 2023, 12:46pm

Yes, that is correct. Removing the

-javaagent:/home/username/elastic-apm-agent-1.35.0.jar

jvm startup parameter fixed the issue and we are not experiencing any crashes.

Jack_Shirazi · January 10, 2023, 1:46pm

thanks, the larger set of crash logs has a couple of crashes that show intercepts of the Elastic methods happening from AWS interceptors. Are you explicitly adding interceptors, or is that something Amazon does automatically (maybe eg for XRay tracking of errors)?

mafaul · January 10, 2023, 1:53pm

We are not explicitly adding interceptors.

Jack_Shirazi · January 10, 2023, 2:25pm

And no dependencies on aws-xray-* ?

mafaul · January 11, 2023, 6:25am

It might be worth noting that we are running on app server Payara 6.2022.2.
As far as I can see they don't include the aws xray dependency either.

Jack_Shirazi · January 11, 2023, 1:06pm

Could you try with this 1.35 snapshot please, it is just the latest build with the throwables no longer captured for those paths (PR). We suspect this is an interplay with how coretto specializes Throwable handling on AWS machines. If this stops the crashes here, we'll look at how best to apply it more generically

mafaul · January 11, 2023, 2:26pm

Sure, will give it a go and revert back tomorrow.

mafaul · January 12, 2023, 1:01pm

It works

Jack_Shirazi · January 12, 2023, 3:54pm

Thanks! We'll produce a more complete PR and include that in an upcoming release

mafaul · January 13, 2023, 7:45am

Than you. I appreciate it

Jack_Shirazi · January 13, 2023, 2:47pm

Just for completeness, please test the updated snapshot which looks specifically for a corretto JVM and avoids the capture only in that case

mafaul · January 31, 2023, 8:52am

Will do, apologies for the delay. It will be tested tomorrow

Jack_Shirazi · January 31, 2023, 9:21am

actually we're about to release the new version with the workaround, so just test that when it's out please rather than the snapshot

mafaul · February 4, 2023, 5:01am

v1.36.0 works perfectly, thank you.

system · February 25, 2023, 1:02am

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
JVM crashed after upgrade to apm java agent 1.19 APM java	17	1323	January 18, 2021
Java APM agent crashes JVM Corretto 17 on AWS EB Linux 2023 Elastic Observability	0	192	October 27, 2023
JVM Crash with APM APM java	20	1387	November 8, 2020
APM agent for Java is causing system crashes APM	9	1242	March 8, 2019
SIGSEGV JVM Crash with ZGC Garbage collection Elastic Observability	4	1001	October 12, 2023

SIGSEGV JVM Crash

Related topics