Multi-Bucket Scoring Machine Learning

We're in the process of working on a more detailed blog of the multi-bucket feature, but I'll include some details here. We look at the difference between our predictions and the observed values over an extended period: a sliding window of the past twelve buckets. We learn a distribution for this feature and then compute anomalousness from this distribution as we do for single bucket features. We respect sidedness when we compute how unusual this feature value is, so if you've selected high_count the values have to be on average high over the sliding window (this doesn't mean they are high in the most recent bucket).

The impact factor is a sliding scale from "exclusively due to single bucket" when it's -5 to "exclusively due to multi-bucket" when it's 5. When the factor is 0 the contributions are around equal. In the UI the cross is displayed, i.e the multi-bucket effect is high, when this factor is greater than 2.

It is worth mentioning that we never rate limited anomaly scores and repeated anomalies can occur without multi-bucket as well. We always saw that this was a function of the alerting layer on top of the raw results: i.e. don't send an alert if the score is less than or equal the previous bucket or even if the severity is less than or equal. However, I realise multi-bucket has made it more likely that one will get an extended period of anomalies.

In our testing we found including the multi-bucket feature is useful because it allowed us to deal with misconfigured bucket lengths better and also detect important events which were missed altogether without considering it. However, we have had some feedback that it is currently rather sensitive. I've made a couple of changes aimed at reducing sensitivity, which will be available in 7.3, see this and this commit.

We intended that the impact factor could essentially be used to filter out the multi-bucket results as a proxy to disabling if the user wanted, but, in any case, we've considered adding more advanced configuration options in the past and this could be a good candidate.