[Kibana] Visualize number of documents having a field inferior to the 99th percentile of this field

Hello,

Is there a way, in a Kibana visualization (Lens, TSVB...), to display on a chart (line, bar, etc.), for each time bucket, the number of documents having a field, let's says "response_time", inferior to the 99th percentile of "response_time" in this time bucket?

Thank you,
Jerome

I think you are referring to the Percentile Rank aggregation available in Elasticsearch and exposed in Lens.

This is an example showing three layers on a Lens line chart showing 99%, 50%, and 25% percentile ranks for the eCommerce dataset products.base_price field:

Hi Jorge,

Thanks for your reply. I don't think that it is what I'm looking for.

As far is I understand from the documentation, "percentile rank aggregation" gives you the percentile rank, in your data, of certain values. Your example does not show the 99th, 50th and 25th percentile ranks, but the percentile ranks associated to the values 99, 50 and 25 for products.base_price field for each time bucket. For exampe, the blue line shows, in each time bucket, the percetile rank of a products.base_price equal to 50 (dollars I guess), and this rank is around 80%.

What I would like to have is the number of documents in my data that match a percentile rank.

I'll try to give a simple example:
Assume that, for a certain bucket, we have 20 documents. Each document has a field "response_time" expressed in milliseconds.

  • 11 documents have a response_time equal to 50
  • 5 have 100
  • 2 have 200
  • 2 have 500

If I'm not wrong:

  • 90th percentile is 500 and 18 documents has a response_time below this value
  • 80th percentile is 200 and 16 documents has a response_time below this value
  • 55th percentile is 100 and 11 documents has a response_time below this value

What I would like is a metric that give me, for a specific percentile rank, the number of documents below the value associated to this percentile rank.
So if I specify "90" as parameter of this metric, it would give me "18" for this bucket.

I hope it's more clear.

Yeah, you are right, and the low values of the prices in my example did not help me to understand that aggregation.

I don't think there's a way to do this in Elasticsearch in a single query, but I may be wrong, so I will share here at least the data preparation as per your details, and I'll transfer this to the Elasticsearch forum.

If there's a way to run this in ES, then maybe you can configure a Vega visualization that runs that custom query to render your data.

Sorry for not being of much help here :confused:

# Create the index
PUT discuss-344485
{
  "mappings": {
    "properties": {
      "ts": {"type": "date"},
      "metric": {"type": "integer" }
    }
  }
}

# Add data for two days with different values
POST discuss-344485/_bulk
{ "index": {}}
{ "ts": "2023-10-10", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 100 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 100 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 100 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 100 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 100 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 200 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 200 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 500 }
{ "index": {}}
{ "ts": "2023-10-10", "metric": 500 }
{ "index": {}}
{ "ts": "2023-10-11", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-11", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-11", "metric": 50 }
{ "index": {}}
{ "ts": "2023-10-11", "metric": 100 }
{ "index": {}}
{ "ts": "2023-10-11", "metric": 500 }

With a percentiles aggregation, you only have the values but not the counts. i don't see how to take those values in a subsequent aggregation to retrieve that data.

# Get the 50/80/90 percentiles
GET discuss-344485/_search
{
  "size": 0,
  "aggs": {
    "by_date": {
      "date_histogram": {
        "field": "ts",
        "calendar_interval": "day"
      },
      "aggs": {
        "by_percentile": {
          "percentiles": {
            "field": "metric",
            "percents": [ 50, 80, 90 ]
          }
        }
      }
    }
  }
}

# Returns
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 25,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "by_date": {
      "buckets": [
        {
          "key_as_string": "2023-10-10T00:00:00.000Z",
          "key": 1696896000000,
          "doc_count": 20,
          "by_percentile": {
            "values": {
              "50.0": 50,
              "80.0": 120.00000000000011,
              "90.0": 230.00000000000043
            }
          }
        },
        {
          "key_as_string": "2023-10-11T00:00:00.000Z",
          "key": 1696982400000,
          "doc_count": 5,
          "by_percentile": {
            "values": {
              "50.0": 50,
              "80.0": 180.00000000000006,
              "90.0": 340
            }
          }
        }
      ]
    }
  }
}

Thank you Jorge for having posting this on the Elasticsearch forum, indeed if you have a solution with an ES request this could help. I am not familiar with ES requests, as in our company we only have access to Kibana (the admin has blocked the console access), so a pure Kibana solution would be great, but if there a way to do this with Vega, I will try to use it.

Thank again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.