Xpack_machinelearning_watch

Hi All,

We are using Xpack and new features with machine learning, am trying to set a watch for initial_record_score exceeds 70 , I need a trigger. but the execution fails ,please anyone let me know if am making any mistakes,i have gone through watch documentation still couldnt figure it out

{
"trigger": {
"schedule": {
"interval": "30m"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
".ml-anomalies-*"
],
"types": [],
"body": {
"size": 0,
"query": {
"range": {
"initial_record_score": {
"gte": 70
},
"@timestamp": {
"from": "now-2h",
"to": "now"
}
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gte": 0
}
}
},
"actions": {
"send_email": {
"email": {
"profile": "standard",
"to": [
"test@test.dk"
],
"subject": "Anamolydetection",
"body": {
"text": "Anamolydetection"
}
}
}
}
}

Thanks in advance,
Raj

Hi Raj

There was a recent excellent blog article which describes how to debug a Watch. https://www.elastic.co/blog/watching-the-watches-writing-debugging-and-testing-watches

Can I ask what errors you are seeing?

Without having looked at the JSON in detail, I see that the query is looking at initial_record_score. We are still working on documentation for how ML and Watcher integrate, but just to say that alerting off anomaly_score is the recommended best practice. The anomaly_score is the aggregated score for the analysis bucket. If you have very high cardinality data, then there could be 10's or 100's of records with a high record_score, therefore this is useful information when investigating, but not alerting.

Regards
Sophie

1 Like

Hi Sophie,

Thanks alot for the reply :slight_smile:

Am getting this message

"type": "search",
  "status": "failure",
  "reason": "ParsingException[[range] query doesn't support multiple fields, found [initial_record_score] and [@timestamp]]",
  "search": {
    "request": {
      "search_type": "query_then_fetch", 

Thanks,
Raj

Raj,

Indeed, the range filter can only handle one field at a time. In order to filter on more than one field, you'll need separate statements. See my example below:

{
    "trigger" : {
      "schedule" : { "interval" : "5m" } 
    },
    "input" : {
      "search" : {
        "request" : {
          "indices" : [ ".ml-anomalies-myjob" ],
          "body" : {
            "query": {
              "bool": {
                "filter": [
                    { "range" : { "timestamp" : { "gte": "now-10m" } } },
                    { "term" :  { "result_type" : "bucket" } },
                    { "range" : {"anomaly_score" : {"gte" : "75"}}}
  
                ]
              }
            }
          }
      }
    }
   },
    "condition" : { 
      "compare" : { "ctx.payload.hits.total" : { "gt" : 0 }}
    },
    "actions" : {
      "log" : {
        "logging" : {
          "text" : "Anomalies:\n{{#ctx.payload.hits.hits}}score={{_source.anomaly_score}} at time={{_source.timestamp}}\n{{/ctx.payload.hits.hits}}"
        }
      }
    }    
  }

Please note a few things:

  • the index name of .ml-anomalies-myjob is an pre-built alias for the anomaly results index for a job named "myjob"
  • It's best to limit the search to result_type:bucket per Sophie's suggestion above
  • Notice the two different range statements, one for timestamp and one for anomaly_score
  • Used the field timestamp, not @timestamp

Example output from this would be:

Anomalies:
score=90.7 at time=1455034500000

Hope that helps

1 Like

You guys are awesome for explaining with an example that makes lot of difference ,

Thanks alot both of you Sophie and Rich :slight_smile:

Hi Sophie, just curious if the ML/Watcher integration documentation is available yet? I don't see anything in the docs currently.

Hi @Ryan_Groten, please bear with us - this will likely land as a blog first within the next few weeks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.