Monitoring Machine Learning Job States

Hi,

I wonder what would be the best approach to get alerts when a job fails or goes into an unexpected state. I have some failed jobs and some jobs showing closed with started datafeed ( so moving from open/started to closed/started) - is there a way to get alerts on those job states. The notification index might help, but still very noisy with manual closing/starting of jobs.

Thanks,
Sara

Hi @saraKM,

At the moment you can utilize Watcher to get notified about the job and datafeed states.

Use a search input with querying the .ml_notifications-* index for messages your are interested in. Example of the input:

{
  "input": {
    "search": {
      "request": {
        "indices": [
          ".ml-notifications-*"
        ],
        "body": {
          "sort": {
            "timestamp": {
              "order": "desc"
            }
          },
          "query": {
            "bool": {
              "filter": [
                {
                  "range": {
                    "timestamp": {
                      "gte": "now-15m"
                    }
                  }
                },
                {
                  "terms": {
                    "message.raw": [
                      "Datafeed stopped"
                    ]
                  }
                },
                {
                  "term": {
                    "job_id": {
                      "value": "my_job"
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}

To avoid extra noise consider action throttling.

FYI Kibana Alerting provides a more convenient way of managing your alerting rules and action. It's actively developing and new alert types are included with each release. Feel free to create a GitHub issue in the Kibana repo for the alert type you're interested in.

Hope it helps.

Regards,
Dima

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.