Prometheus fields no longer available after enabling APM on Java process

Prometheus fields no longer available after enabling APM on Java process

We are using
Kibana 7.10.1
APM Agent 1.24
Browser used Chrome

Steps to reproduce:

  1. We have enabled APM on one of our Java processes

-Delastic.apm.service_name="Name of Service"
-Delastic.apm.server_urls=https://{{ apm_server }}:8200

  1. With the above enabled, the APM data metrics get sent to the APM server and is available in Kibana in the APM Dashboard.

  2. APM Dashboards show metrics correctly for the Service in question.

The problem arises in that some of the prometheus fields/metrics are no longer available to be scraped to Grafana. The prometheus exporter that makes the metrics available to be scraped did not change in any way.

Once we disable APM by removing the reference to the jar the prometheus fields/metrics were again available

How could it be that the APM jar is updating/changing the metrics made available by the exporter

Hi @momalley ,

The agent should not interfere with other ways to capture metrics, if it does then it's definitely an issue we need to investigate and fix.

Could you clarify a bit your context here:

  • how are Prometheus metrics usually collected in your application ? What component/library is in charge of that ?
  • What are the impacted metrics ? How are they usually captured ? Does it relies on JMX ?
  • is there any debug/error message that would indicate a different behavior of your prometheus metrics capture process ?

It does seem strange to us aswell that the APM agent would interfere with how we capture the metrics.

We run our java application inside a Docker container. Spring boot is used and the metrics exporter used is Micrometer Application Monitoring
Spring Metrics

MeterRegistry is used for the Metrics
import io.micrometer.core.instrument.MeterRegistry

The above exporter makes the metrics available on the node so that they can be scraped by Grafana.

Some prometheus fields that were available prior to enabling APM are no longer there. We have not seen any errors in the application, just that the prometheus fields are missing.

Thanks in advance

@Sylvain_Juge update above

Please provide some additional input, so we have a better starting point for analyzing this:

  • So some are are disappearing and some not? If yes:
    • Can you find a common factor to the ones not affected vs. the ones affected? For example - Meter names? Meter types? Meter tags?
    • Please provide a couple of examples of affected and non-affected meters (metricsets)
    • What happens if you enable the agent and disable the collection of an affected metric through the disable_metrics config? Does it restore this metric in Prometheus?
  • What happens when using the agent but turning off Micrometer instrumentation entirely by setting disable_instrumentations=micrometer?

@Eyal_Koren @Sylvain_Juge

We have narrowed down the issue a little. We have a scenario with one Java service where the issue does not occur and another scenario where is does occur.

For the scenario where the issue happens we see that the service jar file is called using the "-cp" parameters. Below is an example. In particular look at "-cp /etc/hbase/conf:/etc/hadoop/conf:software-ingester.jar"

exec java $(memory_options)$java_trust_store$kmsw_jaas_path -Dhttps.proxyHost=######## -Dhttps.proxyPort=8080 **-cp /etc/hbase/conf:/etc/hadoop/conf:software-ingester.jar** -Dhttps.proxyHost=$https_proxyHost -Dhttps.proxyPort=$https_proxyPort org.springframework.boot.loader.JarLauncher

For another scenario where there are no issues with missing prometheus fields the "-cp" parameters are not used. Should the APM.jar behave any different when the "-cp" parameters are used in the Java call?

Should the APM.jar behave any different when the "-cp" parameters are used in the Java call?

It shouldn't, as long as you don't attach the agent jar through -cp, which is not allowed.
How do you attach the agent? What configurations do you apply for it?

Please answer my questions above, so we can properly assist.
Providing a debug level log (see log_level) may be useful as well, just make sure it includes the entire startup of your application. You can share it through gist.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.

The fix for this issue will be soon merged and available in the next version - 1.28.0