95th percentile aggregation for time series like documents

Hi there,

I am struggling with elastic search (and finally Grafana) to display the 95th percentile value for a given time period.

Consider the following setup:

I have an an index called traffic:

POST /traffic/_search

This index stores regular time series with exactly 5 minute intervals.

The document contains the following fields (left out others for brevity:

    {
        "@timestamp": "2020-04-02T00:00:00Z"
        ....
        "bytesInPerSecond": 1237832,
        "bytesOutPerSecond" 1232922,
        "interface": "eth0",
        "server": "my-db-server",
        ....
    },
    ....
    {
        "@timestamp": "2020-04-02T00:05:00Z"
        ....
        "bytesInPerSecond": 898239,
        "bytesOutPerSecond" 892,
        "interface": "eth1",
        ....
        "server": "my-db-server",
    }

I would like to have elastic search give me the 95th percentile for a given month (in my example for April).

    {
      "size": 0,
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "@timestamp": {
                  "gte": 1585699200000,
                  "lte": 1588291120000,
                  "format": "epoch_millis"
                }
              }
            },
            {
              "query_string": {
                "analyze_wildcard": true,
                "query": "server:my-db-server"
              }
            }
          ]
        }
      },
      "aggs": {
        "prepare_the_data_aggregation": {
          "date_histogram": {
            "interval": "5m",
            "field": "@timestamp",
            "min_doc_count": 0,
            "extended_bounds": {
              "min": 1585699200000,
              "max": 1588291120000
            },
            "format": "epoch_millis"
          },
          "aggs": {
            "in": {
              "sum": {
                "field": "bytesInPerSecond"
              }
            },
            "out": {
              "sum": {
                "field": "bytesOutPerSecond"
              }
            }
          }
        },
        "95th_in": {
          "percentiles_bucket" : {
            "buckets_path": "prepare_the_data_aggregation>in",
            "percents": [95]
          }
        },
        "95th_out": {
          "percentiles_bucket" : {
            "buckets_path": "prepare_the_data_aggregation>out",
            "percents": [95]
          }
        }
      }
    }

The above query works but returns all the data for the 3 aggreggations: prepare_the_data_aggregation, 95th_in and 95th_out.

Especially the data for the first aggregation prepare_the_data_aggregation is very large as it contains all the 5 minute data points for the entire month .

The only information I need is the result of 95th_in and 95th_out. Is there a way for me tell elastic search that I only want those, and not the results of prepare_the_data_aggregation?

Since this relies on Percentiles Bucket Aggregation which is a form of Pipeline Aggregations, do you know if this kind of querying is also support via Grafana?

Thanks a lot for this amazing product.

Hi @nroccolsw,

do you just want to visualize this in just any tool or do you need to be able to query the 95th percentile from Elasticsearch to use somewhere?

If you just need to see this then a Timelion visualization in Kibana could probably show this from the raw data.

I know noting about Pipeline Aggregations so can't comment on that :grimacing:

Hi A_B,

I primarily need this via the http api to use it from our application code. So it is not ‘just visualizing’.

Ok, then I know nothing that might help :slight_smile:

Is there maybe someone else here that can help me with my question?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.