Suggestion on summarizing running total value

Jehutywong · July 15, 2020, 5:20pm

ES version 7.6

I am finding difficult to do summarize or transforms on running total values like network traffic metrics.

For example i want to get the max bucket on inbound bytes/s. But prior to that, looks like i have to calculate rate (out of the running total) first.

A simple derivative aggregation may easily hit the too many bucket error. Especially the interval setting in date_histogram is short (e.g 1 minute), or there is additional terms aggregation

While considering transforms, it complains "Unsupported aggregation type [date_histogram]"

Please any suggestion? Thanks in advance.

Hendrik_Muhs · July 15, 2020, 6:48pm

Can you provide some example of what you try to do? You can also post the aggregations and/or transform you tried.

As for transform: date_histogram is supported as part of group_by, I wonder why you try to use it in the aggregation part.

Jehutywong · July 16, 2020, 3:22am

The reason i put date_histogram in the aggs part was that i got following error, if i move the date_histogram to the group_by part.

"reason" : "derivative aggregation [diff] must have a histogram, date_histogram or auto_date_histogram as parent"

here is my pivot directive:
"pivot": {
"group_by": {
"host.keyword": {
"terms": {
"field": "host.keyword"
}
},
"@timestamp": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "5m"
}
}
},
"aggregations": {
"diff": {
"derivative": {
"buckets_path": "read"
}
},
"read": {
"max": {
"field": "read"
}
}
},
"max_page_search_size": 2000
},

Here's my derivative aggs (i also tried to use partition in terms aggs. Even if i cut it into small pieces, the number of bucket will exceed limit in a couple of months time range):
"aggs": {
"host": {
"terms": {
"field": "host.keyword",
"size": 9999
},
"aggs": {
"date": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "5m"
},
"aggs": {
"inbound": {
"max": {
"field": "rx"
}
},
"rate": {
"derivative": {
"buckets_path": "inbound"
}
}
}
}
}
}
}

Hendrik_Muhs · July 16, 2020, 5:15am

I see it now. The problem with the transform: "derivative" isn't supported, too. This is a bigger technical limitation and can't be easily fixed.

The other problem I see with your usecase is the amount of data you have. With aggregations you will always run into size problems for usecases like this. You need some sort of chunking/paging which either transform or composite aggs can provide (transform uses composite aggs, but again composite aggs don't support pipeline aggregations).

The only solution I see at the moment is using a transform with everything you have but the derivative aggregation. For adding derivative I suggest to write some custom code that runs queries on the transform destination index, injects the derivative and writes the result back.

Jehutywong · July 16, 2020, 6:27am

So I'd have to complete the derivative out site ES.

Not that decent, but a confirmation of i haven't miss any cool feature in ES is good enough.

Really appreciate. @Hendrik_Muhs

Hendrik_Muhs · July 16, 2020, 6:47am

Feel free to open an enhancement request, it sounds like an interesting use case. It would make sense to add something to transform to enable such things, similar to what pipeline aggregations offer.

Good feedback!

Jehutywong · July 16, 2020, 7:54am

#59684 filed

system · August 13, 2020, 7:55am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Too many aggregation buckets Elasticsearch	3	3078	July 5, 2017
Date_histogram aggregation combined with sum aggregation Elasticsearch	2	536	September 15, 2017
Date histogram aggregation with extended bounds returning too many buckets Elasticsearch	2	5903	May 18, 2017
Post Filtering Date histogram aggregation bucket results not working as intended Elasticsearch	1	730	January 4, 2019
Date Histogram based on aggregate total values Elasticsearch	1	320	September 20, 2019

Suggestion on summarizing running total value

Related topics