Good day,
I have a (simplified) mapping that stores throughput metrics every second:
{ "sample": {
"properties": {
"timestamp": { "type": "date" },
"throughput": { "type": "long" }
}
} }
I would like to calculate average throughput in megabytes per second over 1-minute buckets.
So far I found two ways of doing this:
1. Sum aggregation with post-processing:
{ "aggs": {
"date_agg": {
"date_range": {
"field": "timestamp",
"ranges": [
{ "from": "2015-05-25T14:50:00.000Z", "to": "2015-05-25T14:51:00.000Z" }
.... <several hundred more buckets>
]
},
"aggs": {
"total_throughput": {
"sum": { "field": "timestamp" }
} } } } }
And then divide value of each bucket to (6010241024) on the client side after fetching the data from ES.
2. With a scripted metric:
{ "aggs": {
"date_agg": {
"date_range": {
"field": "timestamp",
"ranges": [
{ "from": "2015-05-25T14:50:00.000Z", "to": "2015-05-25T14:51:00.000Z" }
.... <several hundred more buckets>
]
},
"aggs": {
"thoughput_per_sec": {
"scripted_metric": {
"init_script": "_agg['tp'] = 0",
"map_script": "_agg.tp += doc['throughput'].value",
"reduce_script": "tps = 0; for (a in _aggs) { tps += a['tp'] }; return Math.round(tps/60/1024/1024 * 100)/100"
} } } } } }
The scripted metric works great, except that it's about 4 times slower then doing just sum
and then division on the client side.
I wondering if there is a way to define a script that runs on results of the aggregation. This way I can still use fast built-in sum
agg then do the final calculations in the script (which will run only on several hundreds buckets, to its no biggie).
P.S. No, I can't just use avg
, because I have slightly more complex things to calculate that obviously have no built-in aggregation I can use.