I'm working on a platform that monitors sensors attached to utility services (gas/water/power etc), tracks usage and warns on pre-set alarms and ES has been a huge help in doing this efficiently. One of the last issues however is that alarm thresholds can be set for 15/30/60 minute periods (where sum within the window is > user set threshold) and date_histogram aggregation with those values set as intervals work well where the period happens to line up "on" the hour. eg: if the total consumption for 0600 till 0700 is > X, then the sum of the bucket is fine. If however, the sum of 0630 till 0730 is > X, but 0600 till 0630 and 0730 till 0800 has no/low values, the alarm will not trigger. While I can use offset to catch this specific case, it's not dynamic. Is there any way to have "rolling" offsets (the data is collected in this instance every 5 minutes) and buckets would loop through for each interval period (60 minutes) with an offset of 00, 05, 10, 15, .... 50, 55.
The term rolling offset seems to make the most sense to explain this and hopefully my explanation above makes enough sense.
Not quite sure I understand what you're asking... would it be possible to create a simple set of values that shows the desired response?
I think what you're wanting is effectively a sliding 60 minute window, which ignores the bucket boundaries so that you don't have to fiddle with offsets manually?
Could you perhaps do a date_histogram with 5- or 10-minute interval, then use a moving_avg agg with a window of 60 minutes? That will give you a value for each 60min window. It's not quite what you want, since it'll be averaging together the values in that window rather than the max, but it may get you closer.
Hi @polyfractal - it seems like the moving_avg is along the lines of what I'm after, but as you mentioned, the sum of each window, not the average.
So yes, I'd like to find the sum of each bucket (in this case 15 minute buckets) where I've got samples every 5 minutes. The attachment below shows an example of the data set (1 and 2 col) highlighted 3 column of the values I'm looking to get and a quick explanation of the logic behind the desired values.
If I'm looking for any 15 minute interval where the sum is > 30, there's a lot of different offsets to handle
Hopefully with the example it makes more sense, but let me know if not.
Gotcha, thanks for the clarification. I've had a half-finished PR sitting on my harddrive for a month or two, which adds a "moving function" agg to ES. It calculates an arbitrary function (min, max, sum, etc) or a user-script on each window similar to moving average.
So hopefully that'll do the trick for you. I'm not sure there's a good way to do it currently, short of requesting all the buckets and doing the summation yourself in your application.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.