Multi-Bucket Scoring Machine Learning

Multi-Bucket scoring is a bit confusing. I've read through the blog, but questions remain. How is the multi-bucket score calculated? Does the model go backwards and recalculate multi-bucket scores as more data is processed or does it get one score for each bucket with no recalculation?

Multi-bucket impacts will alert way too often for us so I would like to alert only on non-multi-bucket impact buckets. Also, when multi-bucket anomalies are present, zero and low counts will show in a high count detector (not my favorite feature). A high count detector should only show high counts. If I set my alert to only look for multi-bucket impact scores less than 0 am I guaranteed to get only one alert for a multi-bucket anomaly? How is the Kibana multi-bucket rating of high, medium, and low determined?

Think of multi-bucket as a sliding window (with a width of 12 prior bucket_spans) that determines if the window is unusual or not. It is a separate consideration than a bucket-by-bucket anomaly.

I'm curious as to why multi-bucket anomalies will alert for you too often - makes me think that you could do other optimizations to your job config...

But, with that said, yes, you can avoid alerting on multi-bucket anomalies altogether by filtering out those that have a multi_bucket_impact of more than -5, for example:

GET .ml-anomalies-farequote/_search
{
    "query": {
            "bool": {
              "filter": [
                  { "range" : { "timestamp" : { "gte": "now-5y" } } },
                  { "term" :  { "result_type" : "record" } },
                  { "range" : { "record_score" : { "gte": "75" } } },
                  { "range" : { "multi_bucket_impact" : { "lt": "-4" } } }

              ]
            }
    }
}

Response:

...
        "_source" : {
          "job_id" : "farequote",
          "result_type" : "record",
          "probability" : 1.1847440707278515E-5,
          "multi_bucket_impact" : -5.0,
          "record_score" : 90.67726,
          "initial_record_score" : 85.04854038747868,
          "bucket_span" : 600,
          "detector_index" : 0,
          "is_interim" : false,
          "timestamp" : 1486656600000,
          "function" : "count",
          "function_description" : "count",
          "typical" : [
            127.85619421616816
          ],
          "actual" : [
            277.0
          ]
        }

We are trying to prevent additional support ticket volume after we've detected a major incident for a client so we would like to know as early as possible if volume is unusual. If I set my buckets to 30m we would get an early alert, but due to multi bucket anomalies this will alert every 30 minutes throughout several hours on some days. We are going to change to 60m in order to prevent alert fatigue because if we over alert they will just get ignored.

Do you happen to know the answer to my original question? Does a multi-bucket score get recalculated for a past bucket once new data is available or does the multi-bucket impact score get calculated once?

Sometimes I notice multi-bucket anomalies don't begin with a score of -5. The will have 0.2 or something. Is it because the model looked back up to 12 buckets (6 or 12 hours before) and saw something?

Also, why would I only choose where only the multi-bucket impact is -5? Why not 0 or -3 or even 3? I'm concerned some alerts won't go through because we completely ignored multi-bucket impacts. I'm not sure what the best selection is here.

Ever consider giving users the option to turn off multi-bucket anomaly detection?

We're in the process of working on a more detailed blog of the multi-bucket feature, but I'll include some details here. We look at the difference between our predictions and the observed values over an extended period: a sliding window of the past twelve buckets. We learn a distribution for this feature and then compute anomalousness from this distribution as we do for single bucket features. We respect sidedness when we compute how unusual this feature value is, so if you've selected high_count the values have to be on average high over the sliding window (this doesn't mean they are high in the most recent bucket).

The impact factor is a sliding scale from "exclusively due to single bucket" when it's -5 to "exclusively due to multi-bucket" when it's 5. When the factor is 0 the contributions are around equal. In the UI the cross is displayed, i.e the multi-bucket effect is high, when this factor is greater than 2.

It is worth mentioning that we never rate limited anomaly scores and repeated anomalies can occur without multi-bucket as well. We always saw that this was a function of the alerting layer on top of the raw results: i.e. don't send an alert if the score is less than or equal the previous bucket or even if the severity is less than or equal. However, I realise multi-bucket has made it more likely that one will get an extended period of anomalies.

In our testing we found including the multi-bucket feature is useful because it allowed us to deal with misconfigured bucket lengths better and also detect important events which were missed altogether without considering it. However, we have had some feedback that it is currently rather sensitive. I've made a couple of changes aimed at reducing sensitivity, which will be available in 7.3, see this and this commit.

We intended that the impact factor could essentially be used to filter out the multi-bucket results as a proxy to disabling if the user wanted, but, in any case, we've considered adding more advanced configuration options in the past and this could be a good candidate.

Thank you Tom, that is helpful. I'm still looking for an answer to my primary question, do you recalculate multi-bucket impacts as more data becomes available or is the multi-bucket impact score calculated once for each bucket and never changes?

The reason this is important is because if I set my alerts to only send on non-multi bucket impacts, and the score gets recalculated then I don't actually know how many alerts are really going to go out based on historical anomalies.

I'm still looking for an answer to my primary question, do you recalculate multi-bucket impacts as more data becomes available or is the multi-bucket impact score calculated once for each bucket and never changes?

Sorry, I somehow missed this question. The answer is no we never update the multi-bucket impact after the result is first written: the feature is derived entirely from historical buckets and isn't forward looking. This applies even if renormalise decides to change the anomaly severity.

Sounds good. I appreciate the feedback here. That gives me enough information to move forward more confidently.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.