Java APM Agent causes CPU spikes

Hi, I set up a java APM agent with Tomcat in production, but the agent caused CPU spikes and during that time, memory usage was normal but GC almost dropped to zero. We didn't have a heap dump but the gc log shows that during the CPU spikes, there were lots of safepoints related to ICBufferFull. Does anyone have any ideas what might cause it?

OS version : CentOS Linux, Version 7(core)
Java Version : openjdk version "11.0.9.1"
APM java agent version : 1.19.0

APM java agent configuration
The APM agent is setup with the Tomcat process with -javaagent flag

/java/1.8.0_275/bin/java -javaagent:elastic-apm-agent-1.19.0.jar -Delastic.apm

GC log:

log/gc.log.4:[2021-01-14T09:45:58.880-0800][1610646358880ms][info ][safepoint         ] Application time: 0.0213354 seconds
log/gc.log.4:[2021-01-14T09:45:58.881-0800][1610646358881ms][info ][safepoint         ] Entering safepoint region: RevokeBias
log/gc.log.4:[2021-01-14T09:45:58.882-0800][1610646358882ms][info ][safepoint         ] Leaving safepoint region
log/gc.log.4:[2021-01-14T09:45:58.882-0800][1610646358882ms][info ][safepoint         ] Total time for which application threads were stopped: 0.0025448 seconds, Stopping threads took: 0.0010856 seconds
log/gc.log.4:[2021-01-14T09:45:59.028-0800][1610646359028ms][info ][safepoint         ] Application time: 0.1453296 seconds
log/gc.log.4:[2021-01-14T09:45:59.030-0800][1610646359030ms][info ][safepoint         ] Entering safepoint region: ICBufferFull
log/gc.log.4:[2021-01-14T09:45:59.031-0800][1610646359031ms][info ][safepoint         ] Leaving safepoint region
log/gc.log.4:[2021-01-14T09:45:59.031-0800][1610646359031ms][info ][safepoint         ] Total time for which application threads were stopped: 0.0036981 seconds, Stopping threads took: 0.0021553 seconds

APM agent config:

recording=true
transaction_sample_rate=0.5
max_queue_size=512
use_path_as_transaction_name=true
ignore_exceptions=java.net.SocketException
transaction_ignore_urls=*.js,*.css,*.jpg,*.jpeg,*.png,*.gif,*.svg
capture_headers=false
capture_body=off
metrics_interval=15s
transaction_max_spans=800
span_min_duration=10ms
profiling_inferred_spans_min_duration=10ms
profiling_inferred_spans_sampling_interval=5ms
profiling_inferred_spans_enabled=true
disable_instrumentations=kafka
async_profiler_safe_mode=16
log_file_size=50mb

Server Metrics:

Hi @bbking,

So far, we haven't had any reports about the safepoint behavior that you describe, did you checked that it does not happen when the agent is disabled ? For example, how many occurrences of ICBufferFull do you get in a day with and without the agent ?

We updated the profiler (async-profiler) in the last release, thus I would suggest to update to 1.20.0 version to see if that still appears with this version.

Also, do you have a particular reason to use safe_mode=16 ? Unless you explicitly had issues in the past it should not be set unless debugging or work-around against known issues.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.