Aggregating data when the bucket interval should be "static"

We have some data which is consistently sampled every 5 minutes from several sources and when displaying the data split over the sources as a "raw" value from each metric we simply use a TSVB metric aggregated with the max with an interval of >=5m and this works nicely. However if I want to aggregate all the sources data then display over a large period of time the aggregated data point will grow larger after the number of "buckets" which is determined by the interval gets to a certain point and becomes say every 10 minutes. What is the best practice in order to display data such as this?

To give an example to help illustrate this view the screen shot showing identical metrics graphed. The 1st metric is using an index pattern override with set to 5m as the 2nd metric is using the panel options metric set to >=5m.

If you grow the time window, the bucket set to >=5m continues to grow which is not desired. What is the best practice in order to visualize this kind of data or is a new structure to the data being stored required?

It's a workaround, but this configuration should give you the right number:

First agg is just the one you are actually after, Sum of k8s_quota.alloctable.pods.total in your case.
Then you apply cumulative sum on this agg, which will just sum up the value bucket by bucket.
Then you take the derivative of the cumulative sum, which reverts the cumulative sum and gives you the original agg again - with the difference that you are able to specify a "unit" now which normalizes the metric - in your case to 5m. This should do nothing when the auto interval was already 5min and give you half of the value if the interval jumped to 10min, and so on.

Keep in mind though that this might give you non-integer values for your metric in some cases because it effectively rounds the metric (in your case the number of pods) for the auto interval (e.g. if you select a really large time range and auto interval becomes 1 day, then it will give you the average number of pods for that day)

Thanks @flash1293, the workaround does work however it looks a bit ugly. Should I be simply creating another document to represent the data at the aggregation level I expect then? Are there any references to how to layout data for Elastic that could help me with this too?

Thanks again!

I always see this as a trade off - if you bring data into the right shape before ingesting it into Elasticsearch, you are buying performance and ease of use (for your specific use case), but you are sacrificing flexibility in changing your mind and querying your data in a different way later.

The “right” way depends on what’s important in your current situation. Unfortunately I don’t know good resources that summarize various scenarios.

A middle ground in the spectrum ease of use <-> flexibility is probably a data frame transform: https://www.elastic.co/guide/en/elasticsearch/reference/current/put-transform.html

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.