ML alerts triggering on interim result

After upgrading from 6.3 to 6.5.0 I've been getting Watcher alerts from ML jobs that are false positives. If I click on the alert link fairly quickly I see 0 hits when there should be some large number, but it also says "Interim result". If I wait a few minutes and refresh, the anomaly score goes from 99 down to <1 as it's found more than 0 hits.

Is this an indication of a performance issue on my ES cluster or some changed behavior in ES - ML - Watcher interactions with 6.5?

Hello - I don't believe there were any relevant changes with respect to interim_result or interactions between ML and Alerting between v6.3 and v6.5.

In general, we shouldn't be creating "interim results" when the actual is less than the expected. We do that when the actual is more than what we've been expecting.

Can you possibly run the following in Dev Tools Console:

GET .ml-anomalies-*/_search
{
    "query": {
            "bool": {
              "filter": [
                  { "range" : { "timestamp" : { "gte": "now-3m" }}},
                  { "term"  : { "job_id" : "yourjobname" }},
                  { "term"  : { "result_type" : "bucket" }},
                  { "term"  : { "is_interim" : "true"}}
              ]
            }
    }
}

(Adjusting the job_id of course)

And try to catch the situation occurring. I'd like to see the output - please paste it here.

Thanks, Rich. Here's the output:

{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 70,
    "successful" : 70,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : ".ml-anomalies-shared",
        "_type" : "doc",
        "_id" : "fusion-prem-volume_bucket_1543527120000_60",
        "_score" : 0.0,
        "_source" : {
          "job_id" : "fusion-prem-volume",
          "timestamp" : 1543527120000,
          "anomaly_score" : 74.31915433005682,
          "bucket_span" : 60,
          "initial_anomaly_score" : 74.31915433005682,
          "event_count" : 0,
          "is_interim" : true,
          "bucket_influencers" : [
            {
              "job_id" : "fusion-prem-volume",
              "result_type" : "bucket_influencer",
              "influencer_field_name" : "bucket_time",
              "initial_anomaly_score" : 74.31915433005682,
              "anomaly_score" : 74.31915433005682,
              "raw_anomaly_score" : 25.807843120201568,
              "probability" : 5.140488806107606E-28,
              "timestamp" : 1543527120000,
              "bucket_span" : 60,
              "is_interim" : true
            }
          ],
          "processing_time_ms" : 1,
          "result_type" : "bucket"
        }
      }
    ]
  }
}

Great, and now run this:

GET .ml-anomalies-*/_search
{
    "query": {
            "bool": {
              "filter": [
                  { "term" : { "timestamp" : 1543527120000}},
                  { "term"  : { "job_id" : "fusion-prem-volume" }},
                  { "term"  : { "result_type" : "bucket" }}
              ]
            }
    }
}

By the way, I cannot seem to replicate your situation at the moment. Until we figure out what's going on you could work around the situation by modifying the Watch to ignore interim results. Just add a:

                  { "term"  : { "is_interim" : "false"}}

to the query the Watch is making.

Here's the result from the last query. Apologies if this was time-sensitive as I didn't see your reply right away.

{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 70,
    "successful" : 70,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : ".ml-anomalies-shared",
        "_type" : "doc",
        "_id" : "fusion-prem-volume_bucket_1543527120000_60",
        "_score" : 0.0,
        "_source" : {
          "job_id" : "fusion-prem-volume",
          "timestamp" : 1543527120000,
          "anomaly_score" : 0.0,
          "bucket_span" : 60,
          "initial_anomaly_score" : 0.0,
          "event_count" : 44522,
          "is_interim" : false,
          "bucket_influencers" : [ ],
          "processing_time_ms" : 0,
          "result_type" : "bucket"
        }
      }
    ]
  }
}

For now this isn't too painful so I'd like to continue working through this to find out the root cause, but thank you for the suggested workaround.

I think I've reproduced your situation - will post an update soon. Hang tight.

We have indeed reproduced the situation and have opened the following issue to address it:

Thanks a bunch for reporting!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.