After running more performance tests and subjecting our services with real-life traffic load profiles, we notice that our services are experiencing high CPU spikes with the Java agent enabled as opposed to our baseline with no agent. We tested with either Tomcat and Jetty and the same behaviour is observed.
As an example, our baseline 20% CPU jumps to 50% with agent and a baseline 2% jumps to around 20% with the agent.
Our profiling shows that the problem may be with the apm-reporter (and not instrumentation itself) and that the bulk of the CPU time is consumed by LMAX Disruptor on LiteBlockingWaitStrategy.waitFor()
I've attached a screenshot of a sample profile:
Elastic APM Stack 6.5.4
Java agent 1.30
We tried on both Jetty and Tomcat, same results
Is there something we can do to fix this or is this expected?
We came across this, but we are not sure if this is related to our issue and what our best course of action will be: https://github.com/LMAX-Exchange/disruptor/blob/master/src/main/java/com/lmax/disruptor/PhasedBackoffWaitStrategy.java