Suggestion on summarizing running total value

ES version 7.6

I am finding difficult to do summarize or transforms on running total values like network traffic metrics.

For example i want to get the max bucket on inbound bytes/s. But prior to that, looks like i have to calculate rate (out of the running total) first.

A simple derivative aggregation may easily hit the too many bucket error. Especially the interval setting in date_histogram is short (e.g 1 minute), or there is additional terms aggregation

While considering transforms, it complains "Unsupported aggregation type [date_histogram]"

Please any suggestion? Thanks in advance.

Can you provide some example of what you try to do? You can also post the aggregations and/or transform you tried.

As for transform: date_histogram is supported as part of group_by, I wonder why you try to use it in the aggregation part.

The reason i put date_histogram in the aggs part was that i got following error, if i move the date_histogram to the group_by part.

"reason" : "derivative aggregation [diff] must have a histogram, date_histogram or auto_date_histogram as parent"

here is my pivot directive:
"pivot": {
"group_by": {
"host.keyword": {
"terms": {
"field": "host.keyword"
}
},
"@timestamp": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "5m"
}
}
},
"aggregations": {
"diff": {
"derivative": {
"buckets_path": "read"
}
},
"read": {
"max": {
"field": "read"
}
}
},
"max_page_search_size": 2000
},

Here's my derivative aggs (i also tried to use partition in terms aggs. Even if i cut it into small pieces, the number of bucket will exceed limit in a couple of months time range):
"aggs": {
"host": {
"terms": {
"field": "host.keyword",
"size": 9999
},
"aggs": {
"date": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "5m"
},
"aggs": {
"inbound": {
"max": {
"field": "rx"
}
},
"rate": {
"derivative": {
"buckets_path": "inbound"
}
}
}
}
}
}
}

I see it now. The problem with the transform: "derivative" isn't supported, too. This is a bigger technical limitation and can't be easily fixed.

The other problem I see with your usecase is the amount of data you have. With aggregations you will always run into size problems for usecases like this. You need some sort of chunking/paging which either transform or composite aggs can provide (transform uses composite aggs, but again composite aggs don't support pipeline aggregations).

The only solution I see at the moment is using a transform with everything you have but the derivative aggregation. For adding derivative I suggest to write some custom code that runs queries on the transform destination index, injects the derivative and writes the result back.

So I'd have to complete the derivative out site ES.

Not that decent, but a confirmation of i haven't miss any cool feature in ES is good enough.

Really appreciate. @Hendrik_Muhs

Feel free to open an enhancement request, it sounds like an interesting use case. It would make sense to add something to transform to enable such things, similar to what pipeline aggregations offer.

Good feedback!

#59684 filed

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.