How to calculate quantile from histogram bucket metrics

Hello! I am using spring boot and micro meter for application metrics. My spring boot application is pushing metrics about the latency to elastic search.

I am using micrometers Percentile histograms - Micrometer accumulates values to an underlying histogram and ships a predetermined set of buckets to the monitoring system.

https://micrometer.io/docs/concepts#_histograms_and_percentiles

From the histogram buckets, is there any way in elastic search or in kibana to calculate 50th, 95th, 99th percentile latency?

Prometheus supports this through function histogram_quantile
https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile

Is there way to achieve similar result in elastic search or kibana?

Thank you.

You can use Visualize application in Kibana to calculate and display percentiles for histogram buckets.

The below screen shot shows an example configuration for creating a visualization to display percentiles for histogram buckets.

Under the covers, Kibana is just using Elasticsearch's _search endpoint with histogram bucket aggregation and percentiles metric aggregation

POST /kibana_sample_data_logs/_search
{
  "aggs": {
    "2": {
      "histogram": {
        "script": {
          "source": "doc['timestamp'].value.getHour()",
          "lang": "painless"
        },
        "interval": 1,
        "min_doc_count": 1
      },
      "aggs": {
        "1": {
          "percentiles": {
            "field": "bytes",
            "percents": [
              50,
              95,
              99
            ],
            "keyed": false
          }
        }
      }
    }
  },
  "size": 0,
  "_source": {
    "excludes": []
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {
    "hour_of_day": {
      "script": {
        "source": "doc['timestamp'].value.getHour()",
        "lang": "painless"
      }
    }
  },
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    {
      "field": "timestamp",
      "format": "date_time"
    },
    {
      "field": "utc_time",
      "format": "date_time"
    }
  ],
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "timestamp": {
              "format": "strict_date_optional_time",
              "gte": "2019-09-20T14:22:14.133Z",
              "lte": "2019-09-27T14:22:14.133Z"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

Thanks for the input. I cannot see bucket split row section in screenshot. Could you please provide that as well.

@Nathan_Reese This works fine if the bytes are not aggregated already. But in my case, input is already bucketed.

In my case, le is buckets(10, 25, 50, 100, 500, 1000, 5000 ms) and value fields is aggregated value for each bucket.

Prometheus supports percentile calculation for already bucket aggregated data using histogram_quantile function
https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile

Can you provide a few sample documents? Not sure I understand the data set or problem

I am measuring latency of "/employee/{id}" URI into five histogram buckets 10ms, 50ms, 100ms, 500ms, 1000ms using java library micrometer.

Following are the example of three scarpe with internal of 15 seconds. Within the 15 seconds interval, latency of "/employee/{id}" API calls are grouped into above mentioned five latency buckets and output metrics looks like the following

{"@timestamp": "2019-10-04T17:27:50.228Z", "name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "10", "value": 1}
{"@timestamp": "2019-10-04T17:27:50.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "50", "value": 2}
{"@timestamp": "2019-10-04T17:27:50.228Z", "name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "100", "value": 7}
{"@timestamp": "2019-10-04T17:27:50.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "500", "value": 125}
{"@timestamp": "2019-10-04T17:27:50.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "1000", "value": 1}

{"@timestamp": "2019-10-04T17:28:05.228Z", "name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "10", "value": 0}
{"@timestamp": "2019-10-04T17:28:05.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "50", "value": 1}
{"@timestamp": "2019-10-04T17:28:05.228Z", "name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "100", "value": 3}
{"@timestamp": "2019-10-04T17:28:05.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "500", "value": 134}
{"@timestamp": "2019-10-04T17:28:05.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "1000", "value": 2}

{"@timestamp": "2019-10-04T17:28:20.228Z", "name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "10", "value": 2}
{"@timestamp": "2019-10-04T17:28:20.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "50", "value": 6}
{"@timestamp": "2019-10-04T17:28:20.228Z", "name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "100", "value": 5}
{"@timestamp": "2019-10-04T17:28:20.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "500", "value": 214}
{"@timestamp": "2019-10-04T17:28:20.228Z","name": "http_server_requests_histogram","method": "GET","outcome": "SUCCESS","status": "200","uri": "/employee/{id}","le": "1000", "value": 10}

With this data, i would like to calculate 50%, 95%, 99% percentile latency. Just to explain the usecase, this can be achieved in prometheus using histogram_quantile function https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.