"No JVMs were found" in JVM tab after APM upgrade from 1.38 to 1.43

Kibana version:
7.16.3

APM Jva agent version:
1.43

Hi,
For my previous ticket we did an upgrade of our apm agent from 1.38 to 1.43 and only that

We now have the transaction correctly but we lost the JVM tab

But we lost the JVM metrics

Any known issue about that ?
I found old tickets about that asking to upgrade to older version than mine

Regards

Additional forgotten informatio: with the agent upgrade, we also upgraded the JVM
from
Temurin-17.0.7+7
to
Temurin-17.0.9+9

Hi !

I just tested your configuration with a recent agent version (close to1.43) and the same stack version and did not manage to reproduce the issue.

Can you query the documents in Discover that match processor.event:"metric" ?
This should normally return the metric documents captured by the agent.

Yes we have metrics

I think the problem here is that your metrics are stored with the wrong name, they should have dots . and not underscores _ as separators, which explains why the UI can't display them.

Do you have any custom ingest pipeline that would change the shape of metrics documents ?

No sorry I don't know why it was "." on ly screenshot, but it's "_"

We did some tests with other APM version for exemple 1.42.0 (same JVM) and we still have the issue with JVM metrics

But when coming back to 1.38 we have the JVM metric but we lost the transactions metrics from spring boot

Thier is certainly some additional settings that was introduce after 1.38 to have those jvm metrics in the tab

There is an option in the agent that was introduced in an earlier version: dedot_custom_metrics, but it only impacts the "custom" metrics that are recorded through the Micrometer or OpenTelemetry APIs, it is not used for the JVM metrics so it's not related to your issue here.

Could you also provide your agent effective configuration (which is within the first lines of the agent log output) ?

I think here it might be a case where it would help to capture the data sent by the agent to the APM server.
Could you follow the capture agent debug logs procedure with log_level=trace ?

Given the agent sends metrics quite quickly after starting, you only need to capture those verbose logs for a few minutes after startup, then you should be able to inspect those logs and grep them with a grep/regex like jvm.memory.used which should match both dots and underscore variants.

With that, we should be able to know if the agent properly captures the metrics as expected or any difference with a previous version.

2023-11-10 13:27:51,823 [main] DEBUG co.elastic.apm.agent.configuration.StartupInfo - VM Arguments: [-Djava.awt.headless=true, -Djava.security.egd=file:/dev/./urandom, -Dcom.sun.management.jmxremote=true, -Dcom.sun.management.jmxremote.ssl=false, -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.jmxremote.port=5000, -Djava.rmi.server.hostname=127.0.0.1, -Dcom.sun.management.jmxremote.host=127.0.0.1, -Dcom.sun.management.jmxremote.rmi.port=5000, -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=127.0.0.1:8000, -Xms2048m, -Xmx10595m, -XX:+AlwaysPreTouch, -XX:+UseStringDeduplication, -Dlogging.config=/opt/xxx/logback-spring.xml, -XX:+AlwaysPreTouch, -javaagent:/opt/apm/elastic-apm-agent-1.43.0.jar]
2023-11-10 13:27:51,824 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - service_name: 'load-test2' (source: Environment Variables)
2023-11-10 13:27:51,824 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - service_node_name: 'xxx.stg.xxx.xxx.company.com' (source: Environment Variables)
2023-11-10 13:27:51,824 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - environment: 'staging' (source: Environment Variables)
2023-11-10 13:27:51,825 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - transaction_sample_rate: '0.5' (source: Environment Variables)
2023-11-10 13:27:51,825 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - transaction_max_spans: '250' (source: Environment Variables)
2023-11-10 13:27:51,825 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - capture_body: 'ERRORS' (source: Environment Variables)
2023-11-10 13:27:51,826 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - trace_methods_duration_threshold: '100ms' (source: Environment Variables)
2023-11-10 13:27:51,826 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - central_config: 'false' (source: Environment Variables)
2023-11-10 13:27:51,826 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - server_urls: 'http://server:8200,http://server:8200,http://server:8200,http://server:8200' (source: Environment Variables)
2023-11-10 13:27:51,827 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - max_queue_size: '4096' (source: Environment Variables)
2023-11-10 13:27:51,827 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - api_request_size: '1mb' (source: Environment Variables)
2023-11-10 13:27:51,827 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - application_packages: 'com.xxx,com.xxx' (source: Environment Variables)
2023-11-10 13:27:51,828 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - span_frames_min_duration: '25ms' (source: Environment Variables)
2023-11-10 13:27:51,828 [main] WARN  co.elastic.apm.agent.configuration.StartupInfo - Detected usage of deprecated configuration option 'span_frames_min_duration'. This option might be removed in the future. Please refer to the documentation about alternatives.
2023-11-10 13:27:51,828 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - log_level: 'TRACE' (source: Environment Variables)
2023-11-10 13:27:51,829 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - log_file: '/var/log/instance/xxx/elastic-apm.log' (source: Environment Variables)
{"metricset":{"timestamp":1699614245380000,"tags":{"area":"heap","id":"G1 Survivor Space"},"samples":{"jvm_memory_used":{"value":38749096.0},"jvm_memory_committed":{"value":41943040.0},"jvm_memory_max":{"value":-1.0}}}}
{"metricset":{"timestamp":1699614245380000,"tags":{"area":"nonheap","id":"Compressed Class Space"},"samples":{"jvm_memory_max":{"value":1073741824.0},"jvm_memory_used":{"value":28004416.0},"jvm_memory_committed":{"value":29032448.0}}}}
{"metricset":{"timestamp":1699614245380000,"tags":{"area":"nonheap","id":"CodeHeap 'profiled nmethods'"},"samples":{"jvm_memory_used":{"value":39452800.0},"jvm_memory_committed":{"value":39518208.0},"jvm_memory_max":{"value":122912768.0}}}}
{"metricset":{"timestamp":1699614245380000,"tags":{"area":"nonheap","id":"Metaspace"},"samples":{"jvm_memory_used":{"value":227871520.0},"jvm_memory_committed":{"value":229703680.0},"jvm_memory_max":{"value":-1.0}}}}
{"metricset":{"timestamp":1699614245380000,"tags":{"area":"nonheap","id":"CodeHeap 'non-nmethods'"},"samples":{"jvm_memory_max":{"value":5828608.0},"jvm_memory_used":{"value":2755968.0},"jvm_memory_committed":{"value":2883584.0}}}}
{"metricset":{"timestamp":1699614245380000,"tags":{"area":"heap","id":"G1 Eden Space"},"samples":{"jvm_memory_max":{"value":-1.0},"jvm_memory_committed":{"value":553648128.0},"jvm_memory_used":{"value":260046848.0}}}}
{"metricset":{"timestamp":1699614245380000,"tags":{"area":"heap","id":"G1 Old Gen"},"samples":{"jvm_memory_used":{"value":1245626368.0},"jvm_memory_committed":{"value":5016387584.0},"jvm_memory_max":{"value":11114905600.0}}}}
{"metricset":{"timestamp":1699614245380000,"tags":{"area":"nonheap","id":"CodeHeap 'non-profiled nmethods'"},"samples":{"jvm_memory_committed":{"value":17694720.0},"jvm_memory_used":{"value":17262848.0},"jvm_memory_max":{"value":122916864.0}}}}
{"metricset":{"timestamp":1699614275380000,"tags":{"area":"heap","id":"G1 Survivor Space"},"samples":{"jvm_memory_used":{"value":18618280.0},"jvm_memory_committed":{"value":25165824.0},"jvm_memory_max":{"value":-1.0}}}}
{"metricset":{"timestamp":1699614275380000,"tags":{"area":"nonheap","id":"Compressed Class Space"},"samples":{"jvm_memory_max":{"value":1073741824.0},"jvm_memory_used":{"value":29586880.0},"jvm_memory_committed":{"value":30605312.0}}}}
{"metricset":{"timestamp":1699614275380000,"tags":{"area":"nonheap","id":"CodeHeap 'profiled nmethods'"},"samples":{"jvm_memory_used":{"value":41390976.0},"jvm_memory_committed":{"value":41418752.0},"jvm_memory_max":{"value":122912768.0}}}}
{"metricset":{"timestamp":1699614275380000,"tags":{"area":"nonheap","id":"Metaspace"},"samples":{"jvm_memory_used":{"value":239819704.0},"jvm_memory_committed":{"value":241696768.0},"jvm_memory_max":{"value":-1.0}}}}
{"metricset":{"timestamp":1699614275380000,"tags":{"area":"nonheap","id":"CodeHeap 'non-nmethods'"},"samples":{"jvm_memory_max":{"value":5828608.0},"jvm_memory_used":{"value":2810752.0},"jvm_memory_committed":{"value":2949120.0}}}}
{"metricset":{"timestamp":1699614275380000,"tags":{"area":"heap","id":"G1 Eden Space"},"samples":{"jvm_memory_max":{"value":-1.0},"jvm_memory_committed":{"value":1493172224.0},"jvm_memory_used":{"value":1015021568.0}}}}
{"metricset":{"timestamp":1699614275380000,"tags":{"area":"heap","id":"G1 Old Gen"},"samples":{"jvm_memory_used":{"value":1279709696.0},"jvm_memory_committed":{"value":4093640704.0},"jvm_memory_max":{"value":11114905600.0}}}}
{"metricset":{"timestamp":1699614275380000,"tags":{"area":"nonheap","id":"CodeHeap 'non-profiled nmethods'"},"samples":{"jvm_memory_committed":{"value":18939904.0},"jvm_memory_used":{"value":18903808.0},"jvm_memory_max":{"value":122916864.0}}}}

Please find what you ask above. Thanks

Thanks, that confirms that the problem is on the agent side.
I need to investigate this further on my side to understand how this could happen.

There is nothing obviously wrong in your agent configuration, which means it could be a bug in the agent.

:frowning: thanks for the first response

In your configuration, there is something quite odd but probably not related to the issue, the value for server_url is duplicated (set in the ELASTIC_APM_SERVER_URLS environment variable in your case).

Do you use OpenTelemetry or Micrometer APIs to capture metrics in your application ?

Also, does the capture_jmx_metrics is configured ? From the provided agent logs it does not seem set, but worth double-checking anyway.

Hello,
For ELASTIC_APM_SERVER_URLS it's just our obfuscation: wa masked the differents url but we same "token": server instead of using diffrent tokens server1, server2...

We don't use OpenTelemetry for the moment. We are starting a POC on it

To our knowledge except if it's a default value, we never set 'capture_jmx_metrics' in our override.apm.conf

Actually, by checking more closely the list of metrics that you've sent those might not be all captured by the agent auto-instrumentation, especially if your application is using micrometer.

Could you try to disable the micrometer instrumentation by setting disable_instrumentations=micrometer (doc) ?

With that, the metrics that you have only metrics that are captured by default by the agent, which are the ones that have dots . in their name, like jvm.fd.used.

If that solves the issue, we still need to investigate how it could interfere (or overwrite) the default agent metrics, but at least it should help us make some progress on this problem.

Also, would it be possible to double check if some of the jvm metrics have a dot in their name ? for example searching for jvm.fd.used would be relevant.

Hi
Please find our answers:

We added ELASTIC_APM_DISABLE_INSTRUMENTATIONS=micrometer to our configuration.

In the logs we have:

2023-11-16 10:57:48,278 [main] INFO co.elastic.apm.agent.configuration.StartupInfo - disable_instrumentations: 'micrometer' (source: Environment Variables)

But, it didn't fix the issue. JVMs are still not displayed.

For the second part (JVM metrics with a dot in their name):

{"metricset":{"timestamp":1700133015061000,"samples":{"system.process.cpu.total.norm.pct":{"value":0.9686509921627481},"jvm.memory.heap.used":{"value":221369568.0},"jvm.memory.non_heap.used":{"value":128880536.0},"jvm.memory.heap.max":{"value":11114905600.0},"jvm.fd.used":{"value":159.0},"jvm.gc.alloc":{"value":4706838512.0},"jvm.memory.non_heap.committed":{"value":130613248.0},"system.cpu.total.norm.pct":{"value":0.991829247957312},"system.process.cgroup.memory.mem.usage.bytes":{"value":2474967040.0},"system.process.memory.size":{"value":14271287296.0},"jvm.memory.heap.committed":{"value":2147483648.0},"system.memory.actual.free":{"value":12541489152.0},"jvm.fd.max":{"value":524288.0},"system.memory.total":{"value":16479371264.0},"jvm.thread.count":{"value":22.0}}}}

Thanks

Ok then it means that unlike what I initially thought, the JVM metrics are very probably correctly captured by the agent, I was misled by the metrics with _ in their name that are actually created by micrometer.

Given you miss only the metrics part, I think it could also be related to missing metadata attributes, and in order to investigate that I'll need a complete JSON document of one of those metrics. For example. by filtering with jvm.memory.heap.used>0 in Discover, you can alter the values for obfuscation, but should not remove anything from the document structure.

jvm.memory.heap.used is empty with this agent version

Does this query return anything with the previous agent version ?

If yes, then please provide:

  • one document for the old agent that works
  • one document for the new agent that does not work (just filtering on the agent version should be enough).