I am experiencing an issue with Elastic APM where the system.cpu.total.norm.pct metric for a specific service is always 1, indicating that CPU usage is constantly at 100%,
However, when I check inside the container, I do not see the same behavior.
I've configured the Java Agent in the container as follows:
java -DXms1024M -Xmx1024M
-XX:+UseParallelGC -Xlog:gc:logs/gc.log
-Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=999 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
-javaagent:/opt/app/bin/elastic-apm-agent-1.47.1.jar
-Delastic.apm.service_name={backend-service}
-Delastic.apm.secret_token={secret_token}
-Delastic.apm.server_url=https://localhost:8200
-Delastic.apm.transaction_sample_rate=0.1
-Delastic.apm.cpu_count=2
Does anyone know what could be causing this issue?
Hi !
There is probably an issue with how the metric is captured by the agent here.
Before we dig any further are also a few unexpected things in your configuration that would be nice to fix:
-Delastic.apm.cpu_count=2
: I don't know this config option, do you know where it comes from ?
-Delastic.apm.server_url=https://localhost:8200
will make the agent send data to localhost
which means unless there is an apm-server running within the container it's quite unlikely to send data anywhere.
Yes, I couldn’t find the parameter -Delastic.apm.cpu_count=2
in Elastic either. I think the issue is that the Elastic Java Agent has misinterpreted the number of cores on my host machine, causing the CPU percentage to always appear high. So, I asked ChatGPT about how to configure it :D, and it provided me with that parameter—most likely, it just made it up.
Besides that, regarding your second point, I have correctly configured the data to be sent to my Elastic APM server, as you can see in the screenshot below:
The system.cpu.total.norm.pct
metric is captured from the JMX interface OperatingSystemMXBean.getSystemCpuLoad
, as I see that you have configured settings to connect to the remote JMX interface, can you try to see what is the actual value of this MBean attribute with a tool like jconsole or visualvm ?
If the value read through JMX is consistent with what the agent reports, then it means the problem is in the JVM, what version are you using ? There has been quite a few bugs in this area, in particular when running in containers.
I checked the CPU usage through JConsole, and it appears to be at a normal level, which is different from what Elastic shows (always 100% CPU usage). What could be causing this issue?
Hi,
Maybe you could try updating to the latest agent version to see if that makes a difference here. If that's still an issue we will need to investigate further.