Bucket span in Job management


Can someone explain what exactly is happening and how the field,aggregation and bucket span work together?

  1. I am not able to get from where exactly the numbers in y-axis are appearing?
  2. I cannot get it if the bucket span checks only for the last 5 seconds of the data dumped as elastic response?(basically if someone could explain how the bucket span works with indexed data that would be great)

Relevant blog: https://www.elastic.co/blog/explaining-the-bucket-span-in-machine-learning-for-elasticsearch

The values of the y-axis are the sum of the field V2A7 in 5s increments (in your case). I assume that in your case, these values are negative in value.

can you please elaborate a bit about in what reference 5s increments are?
In the link mentioned above it states:-

For example, if you were monitoring the average response time of a system, using a bucket span of 1 hour means that at the end of each hour we would calculate the average (mean) value of the last hour’s worth of data and compute the anomalousness of that average value compared to previous hours.
Can you explain what exactly is "compute the anomalousness of that average value compared to previous hours"
Thanks a lot for your help!

You chose a bucket_span of 5s, which I can see in your screenshot. This is why I mentioned it. I'm not saying that this is the correct thing to choose - only just noticed that is what you set

The bucket span is the window of time over which your data is aggregated. So, if you say:

sum(V2A7) with a bucket_span of 5s then all observed values of field V2A7 are summed in little 5 second windows over time and that summed value is modeled with ML.

So for example let's imagine a simplified data set:

time, V2A7
00:00:01, 5
00:00:02, 5
00:00:04, 5
00:00:06, 4
00:00:07, 3
00:00:08, 3
00:00:09, 2
00:00:11, 5
00:00:12, 5
00:00:14, 5
...

With a 5s bucket_span and a sum() aggregation, the above data is summed up into 5s intervals:

00:00:00, 15
00:00:05, 12
00:00:10, 15
...

ML then learns these values over time (let's say, for example, that the above value of 12 to 15-ish is usual and repeatable over time). Then, at some later time, the following values occur:

09:11:00, 15
09:11:05, 240
09:11:10, 15
...

The value of 240 will be seen as unusual since it is very different than the typical sum() values (which are around 12 to 15)

You should choose your bucket_span, however, with the tips from the blog. In 99% of the cases in machine data, the value of bucket_span will likely be measured in minutes, not seconds.

@richcollier
Thank you so much for clearing the doubt, example was really helpful.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.