Bucket span in Job management

Prabhav · April 9, 2019, 9:21am

Can someone explain what exactly is happening and how the field,aggregation and bucket span work together?

I am not able to get from where exactly the numbers in y-axis are appearing?
I cannot get it if the bucket span checks only for the last 5 seconds of the data dumped as elastic response?(basically if someone could explain how the bucket span works with indexed data that would be great)

richcollier · April 9, 2019, 6:37pm

Relevant blog: https://www.elastic.co/blog/explaining-the-bucket-span-in-machine-learning-for-elasticsearch

The values of the y-axis are the sum of the field V2A7 in 5s increments (in your case). I assume that in your case, these values are negative in value.

Prabhav · April 10, 2019, 8:43am

can you please elaborate a bit about in what reference 5s increments are?
In the link mentioned above it states:-

For example, if you were monitoring the average response time of a system, using a bucket span of 1 hour means that at the end of each hour we would calculate the average (mean) value of the last hour’s worth of data and compute the anomalousness of that average value compared to previous hours.
Can you explain what exactly is "compute the anomalousness of that average value compared to previous hours"
Thanks a lot for your help!

richcollier · April 10, 2019, 12:43pm

You chose a bucket_span of 5s, which I can see in your screenshot. This is why I mentioned it. I'm not saying that this is the correct thing to choose - only just noticed that is what you set

The bucket span is the window of time over which your data is aggregated. So, if you say:

sum(V2A7) with a bucket_span of 5s then all observed values of field V2A7 are summed in little 5 second windows over time and that summed value is modeled with ML.

So for example let's imagine a simplified data set:

time, V2A7
00:00:01, 5
00:00:02, 5
00:00:04, 5
00:00:06, 4
00:00:07, 3
00:00:08, 3
00:00:09, 2
00:00:11, 5
00:00:12, 5
00:00:14, 5
...

With a 5s bucket_span and a sum() aggregation, the above data is summed up into 5s intervals:

00:00:00, 15
00:00:05, 12
00:00:10, 15
...

ML then learns these values over time (let's say, for example, that the above value of 12 to 15-ish is usual and repeatable over time). Then, at some later time, the following values occur:

09:11:00, 15
09:11:05, 240
09:11:10, 15
...

The value of 240 will be seen as unusual since it is very different than the typical sum() values (which are around 12 to 15)

You should choose your bucket_span, however, with the tips from the blog. In 99% of the cases in machine data, the value of bucket_span will likely be measured in minutes, not seconds.

Prabhav · April 11, 2019, 2:25am

@richcollier
Thank you so much for clearing the doubt, example was really helpful.

system · May 9, 2019, 2:30am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Variation in analysis results using different aggregation values Elasticsearch	3	609	December 19, 2017
Using aggregation in anomaly detection jobs Elasticsearch elastic-stack-machine-learning	2	507	September 22, 2021
Aggregation interval: 1m, bucket span: 1m Kibana	4	302	November 3, 2020
Can't understand ML plugin Functionalities Elasticsearch elastic-stack-machine-learning	5	815	October 30, 2018
Are these values right for Query delay, Frequency and Bucket Span? Elasticsearch elastic-stack-machine-learning	4	1932	July 31, 2020

Bucket span in Job management

Related topics