Hello,
I'm trying to calculate percentile distribution of number of requests per user per hour. But it seems impossible to define a buckets_path that would traverse into nested bucket aggregation.
The aggregation for the raw data looks like this:
{
"size": 0,
"query": { "bool": {
"must": [
{ "exists": { "field": "remote_user" }}
]
}
},
"aggs": {
"users": {
"terms" : {"field": "remote_user" },
"aggs": {
"time": {
"date_histogram": {
"field": "@timestamp",
"interval": "1h",
"time_zone": "Europe/Berlin",
"min_doc_count": 0
}
}
}
}
}
}
Since I do not care about particular user, the aggregation order could also be inverted. Nevertheless, There doesn't seem to be a way to build percentiles from the doc_counts of the inner buckets. Other posts here suggest to build another pipeline aggregation on the inner level, and aggregate on that. However, percentiles are not associative, so they cannot be computed this way.
"reqs_per_hour": {
"percentiles_bucket": {
"buckets_path": "users>time"
}
}
Results in
buckets_path must reference either a number value or a single value numeric metric aggregation, got: org.elasticsearch.search.aggregations.bucket.histogram.InternalDateHistogram