How to calculate quantile from histogram bucket metrics

surenraju · September 27, 2019, 3:58am

Hello! I am using spring boot and micro meter for application metrics. My spring boot application is pushing metrics about the latency to elastic search.

I am using micrometers Percentile histograms - Micrometer accumulates values to an underlying histogram and ships a predetermined set of buckets to the monitoring system.

https://micrometer.io/docs/concepts#_histograms_and_percentiles

From the histogram buckets, is there any way in elastic search or in kibana to calculate 50th, 95th, 99th percentile latency?

Prometheus supports this through function histogram_quantile
https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile

Is there way to achieve similar result in elastic search or kibana?

Thank you.

Nathan_Reese · September 27, 2019, 2:27pm

You can use Visualize application in Kibana to calculate and display percentiles for histogram buckets.

The below screen shot shows an example configuration for creating a visualization to display percentiles for histogram buckets.

Under the covers, Kibana is just using Elasticsearch's _search endpoint with histogram bucket aggregation and percentiles metric aggregation

POST /kibana_sample_data_logs/_search
{
  "aggs": {
    "2": {
      "histogram": {
        "script": {
          "source": "doc['timestamp'].value.getHour()",
          "lang": "painless"
        },
        "interval": 1,
        "min_doc_count": 1
      },
      "aggs": {
        "1": {
          "percentiles": {
            "field": "bytes",
            "percents": [
              50,
              95,
              99
            ],
            "keyed": false
          }
        }
      }
    }
  },
  "size": 0,
  "_source": {
    "excludes": []
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {
    "hour_of_day": {
      "script": {
        "source": "doc['timestamp'].value.getHour()",
        "lang": "painless"
      }
    }
  },
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    {
      "field": "timestamp",
      "format": "date_time"
    },
    {
      "field": "utc_time",
      "format": "date_time"
    }
  ],
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "timestamp": {
              "format": "strict_date_optional_time",
              "gte": "2019-09-20T14:22:14.133Z",
              "lte": "2019-09-27T14:22:14.133Z"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

surenraju · September 28, 2019, 7:07am

Thanks for the input. I cannot see bucket split row section in screenshot. Could you please provide that as well.

surenraju · October 4, 2019, 7:28am

@Nathan_Reese This works fine if the bytes are not aggregated already. But in my case, input is already bucketed.

In my case, le is buckets(10, 25, 50, 100, 500, 1000, 5000 ms) and value fields is aggregated value for each bucket.

Prometheus supports percentile calculation for already bucket aggregated data using histogram_quantile function
https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile

Nathan_Reese · October 4, 2019, 1:45pm

Can you provide a few sample documents? Not sure I understand the data set or problem

surenraju · October 4, 2019, 5:42pm

I am measuring latency of "/employee/{id}" URI into five histogram buckets 10ms, 50ms, 100ms, 500ms, 1000ms using java library micrometer.

Following are the example of three scarpe with internal of 15 seconds. Within the 15 seconds interval, latency of "/employee/{id}" API calls are grouped into above mentioned five latency buckets and output metrics looks like the following

{"@timestamp": "2019-10-04T17:27:50.228Z", "name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "10", "value": 1}
{"@timestamp": "2019-10-04T17:27:50.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "50", "value": 2}
{"@timestamp": "2019-10-04T17:27:50.228Z", "name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "100", "value": 7}
{"@timestamp": "2019-10-04T17:27:50.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "500", "value": 125}
{"@timestamp": "2019-10-04T17:27:50.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "1000", "value": 1}

{"@timestamp": "2019-10-04T17:28:05.228Z", "name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "10", "value": 0}
{"@timestamp": "2019-10-04T17:28:05.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "50", "value": 1}
{"@timestamp": "2019-10-04T17:28:05.228Z", "name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "100", "value": 3}
{"@timestamp": "2019-10-04T17:28:05.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "500", "value": 134}
{"@timestamp": "2019-10-04T17:28:05.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "1000", "value": 2}

{"@timestamp": "2019-10-04T17:28:20.228Z", "name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "10", "value": 2}
{"@timestamp": "2019-10-04T17:28:20.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "50", "value": 6}
{"@timestamp": "2019-10-04T17:28:20.228Z", "name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "100", "value": 5}
{"@timestamp": "2019-10-04T17:28:20.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "500", "value": 214}
{"@timestamp": "2019-10-04T17:28:20.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "1000", "value": 10}

With this data, i would like to calculate 50%, 95%, 99% percentile latency. Just to explain the usecase, this can be achieved in prometheus using histogram_quantile function https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile

system · November 1, 2019, 5:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Percentile aggregation for histogram buckets Kibana	2	1067	October 28, 2019
How to calculate quantile from histogram bucket metrics? Kibana	1	154	July 27, 2022
Alerts: Calculate the 99 percentile of a histogram metric Elastic Observability	0	232	November 9, 2022
Trouble Visualizing API Response Times in Percentiles with Prometheus Data in Elasticsearch Kibana dashboard , visualisation	5	99	July 11, 2024
How to create Kibana panel with network metrics Kibana	1	15	December 9, 2024

How to calculate quantile from histogram bucket metrics

Related topics