Java APM agent not pushing CPU & Memory metric - IBM Java on WebSphere Liberty

Kibana version: 7.6.2

Elasticsearch version: 7.6.2

APM Server version: 7.6.2

APM Agent language and version: 1.15.0

Browser version: Chrome

Original install method (e.g. download page, yum, deb, from source, etc.) and version: Downloaded official agent from maven repository

Fresh install or upgraded from other version? Fresh APM, Upgraded Elasticsearch and Kibana

Is there anything special in your setup? For example, are you using the Logstash or Kafka outputs? Are you using a load balancer in front of the APM Servers? Have you changed index pattern, generated custom templates, changed agent configuration etc.
NO

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

APM agent starts fine and is pushing all metrics apart from CPU and System Memory (system memory is getting something but incorrect data (negative values)

Steps to reproduce:

  1. Start Liberty server with APM agent

Provide logs and/or server output (if relevant):

2020-04-15 08:38:03.694 [elastic-apm-server-healthcheck] INFO co.elastic.apm.agent.report.ApmServerHealthChecker - Elastic APM server is available: { "build_date": "2020-03-26T04:23:50Z", "build_sha": "8827b43d24eaf42391ea815bb9fa12bdef5d6bce", "version": "7.6.2"}


Launching defaultServer (WebSphere Application Server 20.0.0.3/wlp-1.0.38.cl200320200305-1433) on IBM J9 VM, version 8.0.6.7 - pxa6480sr6fp7-20200312_01(SR6 FP7) (en_US)


{"metricset":{"timestamp":1586941201732000,"samples":{"jvm.memory.heap.used":{"value":220827248.0},"jvm.memory.non_heap.used":{"value":146652100.0},"jvm.memory.heap.max":{"value":805306368.0},"system.process.memory.size":{"value":6338056192.0},"jvm.memory.non_heap.committed":{"value":429576716.0},"jvm.memory.heap.committed":{"value":245432320.0},"jvm.memory.non_heap.max":{"value":-1.0},"system.memory.actual.free":{"value":45879250944.0},"system.memory.total":{"value":1342177280.0},"jvm.thread.count":{"value":57.0}}}}

15/04/2020 10:00:05 {"metricset":{"timestamp":1586941201732000,"tags":{"name":"scavenge"},"samples":{"jvm.gc.time":{"value":2662.0},"jvm.gc.count":{"value":294.0}}}}

15/04/2020 10:00:05 {"metricset":{"timestamp":1586941201732000,"tags":{"name":"global"},"samples":{"jvm.gc.time":{"value":346.0},"jvm.gc.count":{"value":17.0}}}}

Hi and welcome to the forum :wave:

Which version of WebSphere are you using?
Could you please share your full debug logs?

We have some fallbacks for J9 but for some reason they don't seem to work for you: https://github.com/elastic/apm-agent-java/blob/master/apm-agent-core/src/main/java/co/elastic/apm/agent/metrics/builtin/SystemMetrics.java

Hi Felix

Version of Liberty is 20.0.0.3

I get the metrics for cpu and system memory if using openJDK with same version of liberty so I doubt version of liberty is the issue and more the IBM JDK

logs

2020-04-16 14:20:05.550 [elastic-apm-server-healthcheck] DEBUG co.elastic.apm.agent.report.ApmServerHealthChecker - Starting healthcheck to https://apm/
2020-04-16 14:20:05.641 [main] DEBUG co.elastic.apm.agent.impl.payload.SystemInfo - Found Kubernetes pod UID: d7503009-8746-407d-b586-d80bb57d29cf
2020-04-16 14:20:05.647 [main] DEBUG co.elastic.apm.agent.impl.payload.SystemInfo - container ID is 3bdf356666bfb855393d18dc1f0e2e6d260c8ca9e394d0758cc2b38873cc4231
2020-04-16 14:20:05.772 [main] INFO co.elastic.apm.agent.util.JmxUtils - Found JVM-specific OperatingSystemMXBean interface: com.ibm.lang.management.OperatingSystemMXBean
2020-04-16 14:20:06.075 [main] INFO co.elastic.apm.agent.configuration.StartupInfo - Starting Elastic APM 1.15.0 as wlp-hello on Java 1.8.0_241 (IBM Corporation) Linux 3.10.0-1062.18.1.el7.x86_64
2020-04-16 14:20:06.079 [main] DEBUG co.elastic.apm.agent.configuration.StartupInfo - service_name: 'wlp-hello' (source: Java System Properties)
2020-04-16 14:20:06.079 [main] DEBUG co.elastic.apm.agent.configuration.StartupInfo - secret_token: 'XXXX' (source: Java System Properties)
2020-04-16 14:20:06.079 [elastic-apm-remote-config-poller] DEBUG co.elastic.apm.agent.configuration.ApmServerConfigurationSource - Reloading configuration from APM Server https://apm/config/v1/agents
2020-04-16 14:20:06.079 [main] DEBUG co.elastic.apm.agent.configuration.StartupInfo - server_urls: 'https://apm' (source: Java System Properties)
2020-04-16 14:20:06.080 [main] DEBUG co.elastic.apm.agent.configuration.StartupInfo - application_packages: 'com' (source: Java System Properties)

2020-04-16 14:20:06.210 [main] DEBUG co.elastic.apm.agent.impl.circuitbreaker.GCStressMonitor - Registering a heap memory pool (tenured-SOA) for stress monitoring
2020-04-16 14:20:06.210 [main] DEBUG co.elastic.apm.agent.impl.circuitbreaker.GCStressMonitor - Registering a heap memory pool (tenured-LOA) for stress monitoring
2020-04-16 14:20:06.210 [main] DEBUG co.elastic.apm.agent.impl.circuitbreaker.GCStressMonitor - Registering a heap memory pool (nursery-allocate) for stress monitoring
2020-04-16 14:20:06.218 [main] DEBUG co.elastic.apm.agent.impl.circuitbreaker.SystemCpuStressMonitor - Successfully obtained reference to the getSystemCpuLoad method of this JVM's OperatingSystemMXBean implementation
2020-04-16 14:20:06.220 [main] DEBUG co.elastic.apm.agent.jmx.JmxMetricTracker - Deferring initialization of JMX metric tracking until capture_jmx_metrics is set.
2020-04-16 14:20:06.221 [main] INFO co.elastic.apm.agent.impl.ElasticApmTracer - Tracer switched to RUNNING state

Launching defaultServer (WebSphere Application Server 20.0.0.3/wlp-1.0.38.cl200320200305-1433) on IBM J9 VM, version 8.0.6.7 - pxa6480sr6fp7-20200312_01(SR6 FP7) (en_US)




[4/16/20 14:26:26:295 GMT] 00000016 SystemOut O 2020-04-16 14:26:26.295 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - decrement references to 'HealthCheckServlet#doGet' 00-620844a27df6a54a68ef4e7b27ace546-456c92fa3c3fa1d9-01 (ebbb470f) (0)
[4/16/20 14:26:29:331 GMT] 00000115 SystemOut O 2020-04-16 14:26:29.331 [Default Executor-thread-157] DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - increment references to '' 00-f4c54bc128f6a73b52dfc49f852f6ae3-0d86430d2c716921-01 (ebbb470f) (1)
[4/16/20 14:26:29:332 GMT] 00000115 SystemOut O 2020-04-16 14:26:29.332 [Default Executor-thread-157] DEBUG co.elastic.apm.agent.impl.ElasticApmTracer - startTransaction '' 00-f4c54bc128f6a73b52dfc49f852f6ae3-0d86430d2c716921-01 (ebbb470f) {
[4/16/20 14:26:29:332 GMT] 00000115 SystemOut O 2020-04-16 14:26:29.332 [Default Executor-thread-157] DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - increment references to '' 00-f4c54bc128f6a73b52dfc49f852f6ae3-0d86430d2c716921-01 (ebbb470f) (2)
[4/16/20 14:26:29:332 GMT] 00000115 SystemOut O 2020-04-16 14:26:29.332 [Default Executor-thread-157] DEBUG co.elastic.apm.agent.impl.ElasticApmTracer - Activating '' 00-f4c54bc128f6a73b52dfc49f852f6ae3-0d86430d2c716921-01 (ebbb470f) on thread 277
[4/16/20 14:26:29:335 GMT] 00000115 SystemOut O 2020-04-16 14:26:29.335 [Default Executor-thread-157] DEBUG co.elastic.apm.agent.impl.ElasticApmTracer - Deactivating 'HealthCheckServlet#doGet' 00-f4c54bc128f6a73b52dfc49f852f6ae3-0d86430d2c716921-01 (ebbb470f) on thread 277
[4/16/20 14:26:29:336 GMT] 00000115 SystemOut O 2020-04-16 14:26:29.336 [Default Executor-thread-157] DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - decrement references to 'HealthCheckServlet#doGet' 00-f4c54bc128f6a73b52dfc49f852f6ae3-0d86430d2c716921-01 (ebbb470f) (1)
[4/16/20 14:26:29:336 GMT] 00000115 SystemOut O 2020-04-16 14:26:29.336 [Default Executor-thread-157] DEBUG co.elastic.apm.agent.impl.ElasticApmTracer - endTransaction 'HealthCheckServlet#doGet' 00-f4c54bc128f6a73b52dfc49f852f6ae3-0d86430d2c716921-01 (ebbb470f)
[4/16/20 14:26:29:347 GMT] 00000016 SystemOut O 2020-04-16 14:26:29.346 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Receiving TRANSACTION event (sequence 67)
[4/16/20 14:26:29:348 GMT] 00000016 SystemOut O 2020-04-16 14:26:29.348 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - decrement references to 'HealthCheckServlet#doGet' 00-f4c54bc128f6a73b52dfc49f852f6ae3-0d86430d2c716921-01 (ebbb470f) (0)
[4/16/20 14:26:29:360 GMT] 00000015 SystemOut O 2020-04-16 14:26:29.360 [elastic-apm-request-timeout-timer] DEBUG co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Request flush because the request timeout occurred
[4/16/20 14:26:29:362 GMT] 00000016 SystemOut O 2020-04-16 14:26:29.362 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Receiving FLUSH event (sequence 68)
[4/16/20 14:26:29:367 GMT] 00000016 SystemOut O 2020-04-16 14:26:29.367 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Flushing 2975 uncompressed 948 compressed bytes
[4/16/20 14:26:34:451 GMT] 00000013 SystemOut O 2020-04-16 14:26:34.451 [elastic-apm-configuration-reloader] DEBUG co.elastic.apm.agent.impl.ElasticApmTracerBuilder - Beginning scheduled configuration reload (interval is 30 sec)...
[4/16/20 14:26:34:453 GMT] 00000013 SystemOut O 2020-04-16 14:26:34.453 [elastic-apm-configuration-reloader] DEBUG co.elastic.apm.agent.impl.ElasticApmTracerBuilder - Finished scheduled configuration reload
[4/16/20 14:26:35:769 GMT] 00000016 SystemOut O 2020-04-16 14:26:35.769 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Receiving METRICS event (sequence 69)
[4/16/20 14:26:35:770 GMT] 00000016 SystemOut O 2020-04-16 14:26:35.770 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Starting new request to https://apm/intake/v2/events
[4/16/20 14:26:35:812 GMT] 00000016 SystemOut O 2020-04-16 14:26:35.812 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Scheduling request timeout in 10s

I think I know why capturing the CPU metrics doesn't work. The IBM-specific MBean behaves slightly differently than the Sun-specific one:

https://www.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.api.80.doc/com.ibm.lang.management/com/ibm/lang/management/OperatingSystemMXBean.html#getSystemCpuLoad--
https://www.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.api.80.doc/com.ibm.lang.management/com/ibm/lang/management/OperatingSystemMXBean.html#getSystemCpuLoad--

The first call to the method always returns CpuLoadCalculationConstants .ERROR_VALUE (-1.0), which marks the starting point.

However, we treat -1 a not available on this platform and don't register the metric. That should be an easy fix here:

Are you interested in providing a PR for this?

Regarding the Memory graph, there seems to be an inconsistency of what's returned from getTotalPhysicalMemorySize and what's in /proc/meminfo. Could you send me the output of cat /proc/meminfo?

Looks like that could be the issue
If you guys could do the PR that would be great :slight_smile:

MemTotal:       65785860 kB
MemFree:        24504168 kB
MemAvailable:   49915868 kB
Buffers:         1287584 kB
Cached:         19252556 kB
SwapCached:         5816 kB
Active:         16966724 kB
Inactive:       13519576 kB
Active(anon):    6621500 kB
Inactive(anon):  3651740 kB
Active(file):   10345224 kB
Inactive(file):  9867836 kB
Unevictable:      835904 kB
Mlocked:          835956 kB
SwapTotal:       2096444 kB
SwapFree:        2052924 kB
Dirty:             21076 kB
Writeback:             0 kB
AnonPages:      10776940 kB
Mapped:          2018692 kB
Shmem:            294120 kB
Slab:            8252756 kB
SReclaimable:    5727348 kB
SUnreclaim:      2525408 kB
KernelStack:       70448 kB
PageTables:        65092 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    34989372 kB
Committed_AS:   33122168 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      743732 kB
VmallocChunk:   34358801408 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1198080 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:     9537536 kB
DirectMap2M:    57567232 kB
DirectMap1G:           0 kB

Could you try out the snapshot from this PR to see if it fixes your problem?

https://apm-ci.elastic.co/job/apm-agent-java/job/apm-agent-java-mbp/job/PR-1148/1/artifact/src/github.com/elastic/apm-agent-java/elastic-apm-agent/target/elastic-apm-agent-1.15.1-SNAPSHOT.jar

Hi Felix

That looks to work for me :smiley:

Thanks a lot

1 Like

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.