Request for Detailed Guide on Implementing Custom Metrics in Python APM

Divyanshu_Sharma · September 20, 2024, 7:02am

Hello Elastic Team,

I hope this message finds you well. I’m currently exploring the implementation of custom metrics using the Elastic APM Python agent, as outlined in the documentation here.

While the existing documentation provides a good overview, I would appreciate a more detailed guide or additional examples on how to effectively implement and use custom metrics in Python applications. Specifically, insights into which metric type is best for different use cases—such as when to use gauge metrics versus counters, histograms, etc.—and whether specific metric types need to be plotted in certain ways would be extremely helpful. Additionally, more detailed documentation on the methods available for each metric type would greatly enhance my understanding and implementation.

Thank you for your assistance, and I look forward to your guidance!

jessgarson · September 20, 2024, 4:36pm

Thank you so much for reaching out, @Divyanshu_Sharma. This sounds like it could be really great content for us to create in the future. I found this video a helpful resource in the past. I'd like to hear a bit more from you about what problems you faced and if there were any errors along the way.

Divyanshu_Sharma · September 24, 2024, 9:07am

Thanks, @jessgarson , for the quick response.

I reviewed the video, and while it is quite useful, it unfortunately doesn't cover the topic of implementing custom metrics.

What I want to achieve with the Elastic Python APM Agent is to send a few custom metrics that Elastic doesn’t collect on its own (such as CPU and Memory) and have their graphical visualizations on the dashboard. Specifically, I’m interested in custom metrics like:

Processing time for a request, which would be a float value (e.g., 60.0s).
HTTP status code for a request, represented as an integer (e.g., 2XX, 5XX).
A value of "1" emitted for each request, which can be summed over a time range to show the total number of requests processed in that period.

There are additional scenario-specific metrics I’d like to track as well.

In the Elastic documentation, it mentions using the Prometheus client or MetricSet for this implementation. I tried using MetricSet, but I'm having difficulty understanding how to utilize its various data structures, such as gauge, counter, timer, and histogram. Could you provide a code example for one of the custom metrics I listed above?

Additionally, an API reference for the Elastic metrics API and perhaps an article demonstrating custom metric implementation with Elastic APM would also be very helpful.

jessgarson · September 24, 2024, 6:35pm

Thanks for your follow-up and helpful feedback, @Divyanshu_Sharma. I might have confused it with another video I watched. Sorry about that. If I find the video I'm thinking of, I'll add it here. Have you seen this our API reference?

To capture processing times, could you manually start and end transactions to capture custom processing times?

For HTTP status code, would something like this work:

elastic_apm.label(http_status_code=200)

For the number of requests in a period, could you use this method to create a counter?

Divyanshu_Sharma · September 25, 2024, 2:36pm

@jessgarson Yes I have been through the API reference. But I want to use Custom Metrics as defined here.
I don't want to have transactions & spans, but metrics so that I can plot a graph, currently I have achieved this by sending extra parameters in log & indexing them & then plotting them logger.info("Example message!", extra={"processing_time": 30.0})

Please find the code attached below as an example of what I am trying to do. I want to use histogram from MetricSet and use it to send the value 10 a 100 times when that API endpoint is hit.

from fastapi import FastAPI
from elasticapm.contrib.starlette import make_apm_client, ElasticAPM
from elasticapm.metrics.base_metrics import MetricSet

apm = make_apm_client({
    'SERVICE_NAME': 'pw-ds-test',
    'SERVER_URL': 'http://localhost:8200',
})

app = FastAPI()
app.add_middleware(ElasticAPM,client=apm)

metricset = apm.metrics.register(MetricSet)

@app.get("/health")
async def health_check():
    for i in range(0,100):
        metricset.histogram("test_histogram").update(10)
    return {"status": "ok"}

But as you'll find from the screenschot attached below is that it showes a value of 8.5 with count of 100 as opposed to value of 10 with a count of 100

Divyanshu_Sharma · September 25, 2024, 2:38pm

Here's an example of the kind of graphs I want to plot :

jessgarson · September 25, 2024, 7:22pm

Thanks so much for this feedback, @Divyanshu_Sharma. I'm checking in with a few coworkers on this issue and will reply shortly.

Divyanshu_Sharma · September 26, 2024, 5:25am

Sure @jessgarson . Thanks

jessgarson · September 26, 2024, 4:27pm

Thanks for your patience, @Divyanshu_Sharma. I've played with custom metrics but am still new to APM. After chatting with a coworker about this issue, I have a follow-up question. Are you using Linux? If not, you need to install psutil for metric set.

Divyanshu_Sharma · September 27, 2024, 6:08am

Thank you for your efforts and insights on this! @jessgarson . I wanted to clarify that I’m using macOS. But as per the documentation, it seems that psutil is only required for the CPU/Memory MetricSet when not using Linux, not for Custom MetricSet. Also I am able to send data points using the data structures of MetricSet as defined in base_metric.py, but as evident in the attached the code snippet & SS above it's not displaying the correct value in Elastic UI, 8.5 instead of the 10 that I sent. What I want to understand is how do I best utilize these data structures via some code examples & other documentation surrounding it & why that discrepancy in values.

jessgarson · September 27, 2024, 4:28pm

Thanks for all your follow up, @Divyanshu_Sharma. I shared this post in an internal channel, and I'm doing some further testing here. I'll be back in touch shortly.

jessgarson · September 27, 2024, 5:07pm

Thanks again for all your patience, @Divyanshu_Sharma. I chatted with another coworker about this issue, and they suggested using counter or gauge instead of histogram for metrics.

jessgarson · September 27, 2024, 9:36pm

@Divyanshu_Sharma To provide more context here, you will want to use counter if the value only goes up (to calculate as a rate) and gauge if it goes up and down (to track the current value).

Divyanshu_Sharma · October 9, 2024, 6:37am

Hey @jessgarson thanks for all the help !! & apologies for a delayed response.

I tried the gauge and counter they are working and are correctly reflecting values on UI, but for my use case here I won't be able to use either of them as counter would just keep a single value which is the current count which can be incremented or decremented,so I would not be able to keep multiple values/entries like for HTTP Code or processing times
& the same goes for gauge which can be set to a certain value & when updated that values gets updated to a new value. gauge can be useful for a metric like health status (viz Good & Bad) & counter would be useful for showing the current count of request processed or similar.
Let me illustrate with an example for HTTP Code (would be similar for processing time)

@app.get("/health")
async def health_check():
    choices = [400, 200, 500, 512]
    random_value = random.choice(choices)
    # Create a metric
    metricset.counter("test_gauge").val = random_value
    return {"status": "ok"}

Now here if this API endpoint is hit 5 times in 30s & random_value generated are in the order [200,500,200,400,512] and if the metrics_interval(whenever metrics are collected/sent) is set to 30s, then only the last value is sent to the server i.e. 512 .

class Gauge(BaseMetric):
    __slots__ = BaseMetric.__slots__ + ("_val",)

    def __init__(self, name, reset_on_collect=False, unit=None) -> None:
        """
        Creates a new gauge
        :param name: label of the gauge
        :param unit of the observed gauge. Unused for gauges
        """
        self._val = None
        super(Gauge, self).__init__(name, reset_on_collect=reset_on_collect)

    @property
    def val(self):
        return self._val

    @val.setter
    def val(self, value) -> None:
        self._val = value

    def reset(self) -> None:
        self._val = 0

class Counter(BaseMetric):
    __slots__ = BaseMetric.__slots__ + ("_lock", "_initial_value", "_val")

    def __init__(self, name, initial_value=0, reset_on_collect=False, unit=None) -> None:
        """
        Creates a new counter
        :param name: name of the counter
        :param initial_value: initial value of the counter, defaults to 0
        :param unit: unit of the observed counter. Unused for counters
        """
        self._lock = threading.Lock()
        self._val = self._initial_value = initial_value
        super(Counter, self).__init__(name, reset_on_collect=reset_on_collect)

    def inc(self, delta=1):
        """
        Increments the counter. If no delta is provided, it is incremented by one
        :param delta: the amount to increment the counter by
        :returns the counter itself
        """
        with self._lock:
            self._val += delta
        return self

    def dec(self, delta=1):
        """
        Decrements the counter. If no delta is provided, it is decremented by one
        :param delta: the amount to decrement the counter by
        :returns the counter itself
        """
        with self._lock:
            self._val -= delta
        return self

    def reset(self):
        """
        Reset the counter to the initial value
        :returns the counter itself
        """
        with self._lock:
            self._val = self._initial_value
        return self

    @property
    def val(self):
        """Returns the current value of the counter"""
        return self._val

    @val.setter
    def val(self, value) -> None:
        with self._lock:
            self._val = value

Checking the source code of gauge and counter helps me understand that both utilize a
a variable with maybe int or float type(maybe since type is not declared).
After looking at the source code of all available metrics, I think none of them can be utilized to store and send list-like (timeseries/mulitple values, like this [200,500,200,400,512] mentioned in above example ) values. histogram has self._counts which is of type list, but it saves frequency and not the actual value itself.

I was hoping that I could extend the BaseMetric class to create a metric which stores values in a list or dict like datastructure to be able to hold time-series like data. Would that be possible ? Also If this is something that would be helpful to the community I can also open a PR in apm-agent-python to a add new metric type. Would you be able ask someone from apm-agent-python team internally, if you have access, about this ?

Looking forward to your response .

jessgarson · October 9, 2024, 4:50pm

Thanks @Divyanshu_Sharma, for explaining this in more detail. I think this might be more of a general feature request; you can report this here.

Divyanshu_Sharma · October 10, 2024, 10:10pm

@jessgarson Thanks again for all your help & guidance. I have opened a feature request here.

jessgarson · October 20, 2024, 3:44pm

Thanks, @Divyanshu_Sharma.

Topic		Replies	Views
APM Custom Counters? APM	7	2518	May 3, 2018
Post custom metrics from nodejs apm agent APM nodejs	4	447	October 28, 2019
Custom Metrics in Kibana APM java , ui	3	1385	July 15, 2021
Python Agent how to track various counters/values evolution over time? APM python	5	1057	December 17, 2019
Any plans for extended metrics / events? APM	5	439	November 5, 2019

Request for Detailed Guide on Implementing Custom Metrics in Python APM

Related topics