Elastic APM shows 100% CPU usage but the cause is unclear – need help!

I am experiencing an issue with Elastic APM where the system.cpu.total.norm.pct metric for a specific service is always 1, indicating that CPU usage is constantly at 100%,

However, when I check inside the container, I do not see the same behavior.

I've configured the Java Agent in the container as follows:

java -DXms1024M -Xmx1024M                  
                 -XX:+UseParallelGC -Xlog:gc:logs/gc.log
                 -Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=999 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
                 -javaagent:/opt/app/bin/elastic-apm-agent-1.47.1.jar
                 -Delastic.apm.service_name={backend-service}
                 -Delastic.apm.secret_token={secret_token}
                 -Delastic.apm.server_url=https://localhost:8200
                 -Delastic.apm.transaction_sample_rate=0.1
                 -Delastic.apm.cpu_count=2

Does anyone know what could be causing this issue?

Hi !

There is probably an issue with how the metric is captured by the agent here.

Before we dig any further are also a few unexpected things in your configuration that would be nice to fix:

  • -Delastic.apm.cpu_count=2 : I don't know this config option, do you know where it comes from ?
  • -Delastic.apm.server_url=https://localhost:8200 will make the agent send data to localhost which means unless there is an apm-server running within the container it's quite unlikely to send data anywhere.

Yes, I couldn’t find the parameter -Delastic.apm.cpu_count=2 in Elastic either. I think the issue is that the Elastic Java Agent has misinterpreted the number of cores on my host machine, causing the CPU percentage to always appear high. So, I asked ChatGPT about how to configure it :D, and it provided me with that parameter—most likely, it just made it up.

Besides that, regarding your second point, I have correctly configured the data to be sent to my Elastic APM server, as you can see in the screenshot below:

The system.cpu.total.norm.pct metric is captured from the JMX interface OperatingSystemMXBean.getSystemCpuLoad , as I see that you have configured settings to connect to the remote JMX interface, can you try to see what is the actual value of this MBean attribute with a tool like jconsole or visualvm ?

If the value read through JMX is consistent with what the agent reports, then it means the problem is in the JVM, what version are you using ? There has been quite a few bugs in this area, in particular when running in containers.

I checked the CPU usage through JConsole, and it appears to be at a normal level, which is different from what Elastic shows (always 100% CPU usage). What could be causing this issue?

Hi,

Maybe you could try updating to the latest agent version to see if that makes a difference here. If that's still an issue we will need to investigate further.