ML alerts triggering on interim result

benpolzin · November 27, 2018, 7:47pm

After upgrading from 6.3 to 6.5.0 I've been getting Watcher alerts from ML jobs that are false positives. If I click on the alert link fairly quickly I see 0 hits when there should be some large number, but it also says "Interim result". If I wait a few minutes and refresh, the anomaly score goes from 99 down to <1 as it's found more than 0 hits.

Is this an indication of a performance issue on my ES cluster or some changed behavior in ES - ML - Watcher interactions with 6.5?

richcollier · November 29, 2018, 8:14pm

Hello - I don't believe there were any relevant changes with respect to interim_result or interactions between ML and Alerting between v6.3 and v6.5.

In general, we shouldn't be creating "interim results" when the actual is less than the expected. We do that when the actual is more than what we've been expecting.

Can you possibly run the following in Dev Tools Console:

GET .ml-anomalies-*/_search
{
    "query": {
            "bool": {
              "filter": [
                  { "range" : { "timestamp" : { "gte": "now-3m" }}},
                  { "term"  : { "job_id" : "yourjobname" }},
                  { "term"  : { "result_type" : "bucket" }},
                  { "term"  : { "is_interim" : "true"}}
              ]
            }
    }
}

(Adjusting the job_id of course)

And try to catch the situation occurring. I'd like to see the output - please paste it here.

benpolzin · November 29, 2018, 9:35pm

Thanks, Rich. Here's the output:

{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 70,
    "successful" : 70,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : ".ml-anomalies-shared",
        "_type" : "doc",
        "_id" : "fusion-prem-volume_bucket_1543527120000_60",
        "_score" : 0.0,
        "_source" : {
          "job_id" : "fusion-prem-volume",
          "timestamp" : 1543527120000,
          "anomaly_score" : 74.31915433005682,
          "bucket_span" : 60,
          "initial_anomaly_score" : 74.31915433005682,
          "event_count" : 0,
          "is_interim" : true,
          "bucket_influencers" : [
            {
              "job_id" : "fusion-prem-volume",
              "result_type" : "bucket_influencer",
              "influencer_field_name" : "bucket_time",
              "initial_anomaly_score" : 74.31915433005682,
              "anomaly_score" : 74.31915433005682,
              "raw_anomaly_score" : 25.807843120201568,
              "probability" : 5.140488806107606E-28,
              "timestamp" : 1543527120000,
              "bucket_span" : 60,
              "is_interim" : true
            }
          ],
          "processing_time_ms" : 1,
          "result_type" : "bucket"
        }
      }
    ]
  }
}

richcollier · November 29, 2018, 9:57pm

Great, and now run this:

GET .ml-anomalies-*/_search
{
    "query": {
            "bool": {
              "filter": [
                  { "term" : { "timestamp" : 1543527120000}},
                  { "term"  : { "job_id" : "fusion-prem-volume" }},
                  { "term"  : { "result_type" : "bucket" }}
              ]
            }
    }
}

richcollier · November 29, 2018, 10:14pm

By the way, I cannot seem to replicate your situation at the moment. Until we figure out what's going on you could work around the situation by modifying the Watch to ignore interim results. Just add a:

                  { "term"  : { "is_interim" : "false"}}

to the query the Watch is making.

benpolzin · November 29, 2018, 11:57pm

Here's the result from the last query. Apologies if this was time-sensitive as I didn't see your reply right away.

{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 70,
    "successful" : 70,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : ".ml-anomalies-shared",
        "_type" : "doc",
        "_id" : "fusion-prem-volume_bucket_1543527120000_60",
        "_score" : 0.0,
        "_source" : {
          "job_id" : "fusion-prem-volume",
          "timestamp" : 1543527120000,
          "anomaly_score" : 0.0,
          "bucket_span" : 60,
          "initial_anomaly_score" : 0.0,
          "event_count" : 44522,
          "is_interim" : false,
          "bucket_influencers" : [ ],
          "processing_time_ms" : 0,
          "result_type" : "bucket"
        }
      }
    ]
  }
}

For now this isn't too painful so I'd like to continue working through this to find out the root cause, but thank you for the suggested workaround.

richcollier · November 30, 2018, 2:46pm

I think I've reproduced your situation - will post an update soon. Hang tight.

richcollier · November 30, 2018, 5:46pm

We have indeed reproduced the situation and have opened the following issue to address it:

Thanks a bunch for reporting!

system · December 28, 2018, 5:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ML watcher configs Elasticsearch elastic-stack-machine-learning	4	615	November 23, 2020
Machine Learning module is triggering alerts when there is no anomaly Elasticsearch elastic-stack-machine-learning	27	2822	July 1, 2019
Watcher not triggering alerts Elasticsearch elastic-stack-machine-learning	2	506	May 16, 2020
Machine Learning - Watcher - Not Firing Elasticsearch elastic-stack-machine-learning , elastic-stack-alerting	6	870	October 30, 2018
Trigering Alerts for Machine learning Jobs SIEM	3	126	August 1, 2024

ML alerts triggering on interim result

Related topics