Request for Detailed Guide on Implementing Custom Metrics in Python APM

Hello Elastic Team,

I hope this message finds you well. I’m currently exploring the implementation of custom metrics using the Elastic APM Python agent, as outlined in the documentation here.

While the existing documentation provides a good overview, I would appreciate a more detailed guide or additional examples on how to effectively implement and use custom metrics in Python applications. Specifically, insights into which metric type is best for different use cases—such as when to use gauge metrics versus counters, histograms, etc.—and whether specific metric types need to be plotted in certain ways would be extremely helpful. Additionally, more detailed documentation on the methods available for each metric type would greatly enhance my understanding and implementation.

Thank you for your assistance, and I look forward to your guidance!

Thank you so much for reaching out, @Divyanshu_Sharma. This sounds like it could be really great content for us to create in the future. I found this video a helpful resource in the past. I'd like to hear a bit more from you about what problems you faced and if there were any errors along the way.

Thanks, @jessgarson , for the quick response.

I reviewed the video, and while it is quite useful, it unfortunately doesn't cover the topic of implementing custom metrics.

What I want to achieve with the Elastic Python APM Agent is to send a few custom metrics that Elastic doesn’t collect on its own (such as CPU and Memory) and have their graphical visualizations on the dashboard. Specifically, I’m interested in custom metrics like:

  1. Processing time for a request, which would be a float value (e.g., 60.0s).
  2. HTTP status code for a request, represented as an integer (e.g., 2XX, 5XX).
  3. A value of "1" emitted for each request, which can be summed over a time range to show the total number of requests processed in that period.

There are additional scenario-specific metrics I’d like to track as well.

In the Elastic documentation, it mentions using the Prometheus client or MetricSet for this implementation. I tried using MetricSet, but I'm having difficulty understanding how to utilize its various data structures, such as gauge, counter, timer, and histogram. Could you provide a code example for one of the custom metrics I listed above?

Additionally, an API reference for the Elastic metrics API and perhaps an article demonstrating custom metric implementation with Elastic APM would also be very helpful.

Thanks for your follow-up and helpful feedback, @Divyanshu_Sharma. I might have confused it with another video I watched. Sorry about that. If I find the video I'm thinking of, I'll add it here. Have you seen this our API reference?

To capture processing times, could you manually start and end transactions to capture custom processing times?

For HTTP status code, would something like this work:

elastic_apm.label(http_status_code=200)

For the number of requests in a period, could you use this method to create a counter?

@jessgarson Yes I have been through the API reference. But I want to use Custom Metrics as defined here.
I don't want to have transactions & spans, but metrics so that I can plot a graph, currently I have achieved this by sending extra parameters in log & indexing them & then plotting them logger.info("Example message!", extra={"processing_time": 30.0})

Please find the code attached below as an example of what I am trying to do. I want to use histogram from MetricSet and use it to send the value 10 a 100 times when that API endpoint is hit.

from fastapi import FastAPI
from elasticapm.contrib.starlette import make_apm_client, ElasticAPM
from elasticapm.metrics.base_metrics import MetricSet

apm = make_apm_client({
    'SERVICE_NAME': 'pw-ds-test',
    'SERVER_URL': 'http://localhost:8200',
})

app = FastAPI()
app.add_middleware(ElasticAPM,client=apm)

metricset = apm.metrics.register(MetricSet)

@app.get("/health")
async def health_check():
    for i in range(0,100):
        metricset.histogram("test_histogram").update(10)
    return {"status": "ok"}

But as you'll find from the screenschot attached below is that it showes a value of 8.5 with count of 100 as opposed to value of 10 with a count of 100

Here's an example of the kind of graphs I want to plot :

Thanks so much for this feedback, @Divyanshu_Sharma. I'm checking in with a few coworkers on this issue and will reply shortly.

Sure @jessgarson . Thanks

1 Like

Thanks for your patience, @Divyanshu_Sharma. I've played with custom metrics but am still new to APM. After chatting with a coworker about this issue, I have a follow-up question. Are you using Linux? If not, you need to install psutil for metric set.

Thank you for your efforts and insights on this! @jessgarson . I wanted to clarify that I’m using macOS. But as per the documentation, it seems that psutil is only required for the CPU/Memory MetricSet when not using Linux, not for Custom MetricSet. Also I am able to send data points using the data structures of MetricSet as defined in base_metric.py, but as evident in the attached the code snippet & SS above it's not displaying the correct value in Elastic UI, 8.5 instead of the 10 that I sent. What I want to understand is how do I best utilize these data structures via some code examples & other documentation surrounding it & why that discrepancy in values.

Thanks for all your follow up, @Divyanshu_Sharma. I shared this post in an internal channel, and I'm doing some further testing here. I'll be back in touch shortly.

Thanks again for all your patience, @Divyanshu_Sharma. I chatted with another coworker about this issue, and they suggested using counter or gauge instead of histogram for metrics.

@Divyanshu_Sharma To provide more context here, you will want to use counter if the value only goes up (to calculate as a rate) and gauge if it goes up and down (to track the current value).

Hey @jessgarson thanks for all the help !! & apologies for a delayed response.

I tried the gauge and counter they are working and are correctly reflecting values on UI, but for my use case here I won't be able to use either of them as counter would just keep a single value which is the current count which can be incremented or decremented,so I would not be able to keep multiple values/entries like for HTTP Code or processing times
& the same goes for gauge which can be set to a certain value & when updated that values gets updated to a new value. gauge can be useful for a metric like health status (viz Good & Bad) & counter would be useful for showing the current count of request processed or similar.
Let me illustrate with an example for HTTP Code (would be similar for processing time)

@app.get("/health")
async def health_check():
    choices = [400, 200, 500, 512]
    random_value = random.choice(choices)
    # Create a metric
    metricset.counter("test_gauge").val = random_value
    return {"status": "ok"}

Now here if this API endpoint is hit 5 times in 30s & random_value generated are in the order [200,500,200,400,512] and if the metrics_interval(whenever metrics are collected/sent) is set to 30s, then only the last value is sent to the server i.e. 512 .

class Gauge(BaseMetric):
    __slots__ = BaseMetric.__slots__ + ("_val",)

    def __init__(self, name, reset_on_collect=False, unit=None) -> None:
        """
        Creates a new gauge
        :param name: label of the gauge
        :param unit of the observed gauge. Unused for gauges
        """
        self._val = None
        super(Gauge, self).__init__(name, reset_on_collect=reset_on_collect)

    @property
    def val(self):
        return self._val

    @val.setter
    def val(self, value) -> None:
        self._val = value

    def reset(self) -> None:
        self._val = 0

class Counter(BaseMetric):
    __slots__ = BaseMetric.__slots__ + ("_lock", "_initial_value", "_val")

    def __init__(self, name, initial_value=0, reset_on_collect=False, unit=None) -> None:
        """
        Creates a new counter
        :param name: name of the counter
        :param initial_value: initial value of the counter, defaults to 0
        :param unit: unit of the observed counter. Unused for counters
        """
        self._lock = threading.Lock()
        self._val = self._initial_value = initial_value
        super(Counter, self).__init__(name, reset_on_collect=reset_on_collect)

    def inc(self, delta=1):
        """
        Increments the counter. If no delta is provided, it is incremented by one
        :param delta: the amount to increment the counter by
        :returns the counter itself
        """
        with self._lock:
            self._val += delta
        return self

    def dec(self, delta=1):
        """
        Decrements the counter. If no delta is provided, it is decremented by one
        :param delta: the amount to decrement the counter by
        :returns the counter itself
        """
        with self._lock:
            self._val -= delta
        return self

    def reset(self):
        """
        Reset the counter to the initial value
        :returns the counter itself
        """
        with self._lock:
            self._val = self._initial_value
        return self

    @property
    def val(self):
        """Returns the current value of the counter"""
        return self._val

    @val.setter
    def val(self, value) -> None:
        with self._lock:
            self._val = value

Checking the source code of gauge and counter helps me understand that both utilize a
a variable with maybe int or float type(maybe since type is not declared).
After looking at the source code of all available metrics, I think none of them can be utilized to store and send list-like (timeseries/mulitple values, like this [200,500,200,400,512] mentioned in above example ) values. histogram has self._counts which is of type list, but it saves frequency and not the actual value itself.

I was hoping that I could extend the BaseMetric class to create a metric which stores values in a list or dict like datastructure to be able to hold time-series like data. Would that be possible ? Also If this is something that would be helpful to the community I can also open a PR in apm-agent-python to a add new metric type. Would you be able ask someone from apm-agent-python team internally, if you have access, about this ?

Looking forward to your response .

1 Like

Thanks @Divyanshu_Sharma, for explaining this in more detail. I think this might be more of a general feature request; you can report this here.

1 Like

@jessgarson Thanks again for all your help & guidance. I have opened a feature request here.