Visualize OTEL http.server.request.duration metric

I’m currently using the elastic-otel-java agent to instrument Java based applications and have enabled the OTEL_INSTRUMENTATION_HTTP_SERVER_EMIT_EXPERIMENTAL_TELEMETRY option which outputs a new metric called http.server.request.duration which is a histogram of response durations and counts.

This is an example:

{"values":[0.375,0.875,1.75],"counts":[1,2,3]}

Is there an effective way to visualize this using Kibana Lens or some other way to do so? I’d like to be able to create a graph of these based on the value of the duration and counts over time and also by the labels numeric_labels.http_response_status_code and labels.http_request_method

We can ensure the buckets for the histogram are consistent by setting OTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION to explicit_bucket_histogram Metrics Exporter - OTLP | OpenTelemetry

1 Like

Hi @david.michael, Welcome to the Community.

What version stack are you on?

On Recent Versions, Lens should support histograms.

Did you try that? Something like this?

From one of the documents

   "metrics": {
      "transaction.duration.histogram": {
        "values": [
          4008.327388554257,
          4586.599435716125,
          4668.358175930145,
          4876.177108215123
        ],
        "counts": [
          1,
          1,
          1,
          1
        ]
      },

Note there

We are using Kibana 9.1.2. I do see the option to chart it. So does Kibana automatically detect this is a histogram and if I use a function like Average or Count it will automatically know how to process the histogram to get a correct visualization from it?

What I’m seeing in this data doesn’t appear to match the pattern I’d expect from other APM trace data. For example, this is should show average response time by HTTP method for the last 4 hours. But the latency APM graph shows a large spike at a later, different time of about 30 seconds. I see something similar for count of http.server.request.duration and the throughput APM metric not matching.

Example of APM data latency:

Yes it should.

I am not sure what you are comparing that to on the bottom.

Many times when I see things like this it s because the graph is not properly filtered and / or broken down.. like by service or environment or transactions etc. .. the results are different dataset being erroneously aggregated together.

Are all the exact same filters applied...

There are different ways to calculate the latency... see here

It is not clear to me that you are comparing like for like.

I think I gave examples here