Scheduled Watcher Tasks Failing

I've just completed an upgrade to 6.7.0 (for Elasticsearch, Logstash, and Kibana), but my scheduled watches are not running correctly. These watches worked under 6.4.0 (although I had to start and stop the watcher service occasionally to get them working).

Now I'm getting errors like this:
April 17th 2019, 05:03:00.457 metadata.name:Nightly Root Sudo and Su Use
watch_id:Sudo_and_Root_Login node:Jwyj0hcGQeSlTvIUUXViAQ state:executed
status.state.active:true status.state.timestamp:2019-04-16T18:42:08.972Z
status.last_checked:2019-04-17T09:03:00.457Z status.last_met_condition:2019-04-
17T09:03:00.457Z status.actions.email_admin.ack.timestamp:2019-04-16T18:42:08.972Z
status.actions.email_admin.ack.state:awaits_successful_execution
status.actions.email_admin.last_execution.timestamp:2019-04-17T09:03:00.457Z
status.actions.email_admin.last_execution.successful:false
status.actions.email_admin.last_execution.reason: status.execution_state:executed
status.version:-1 trigger_event.type:schedule trigger_event.triggered_time:April 17th 2019,
05:03:00.457 trigger_event.schedule.scheduled_time:April 17th 2019, 05:03:00.000

I have tried making minor edits and resaving them, but they appear to never run successfully. What else should I be checking?

hey,

can you share logfiles from the node that contains the watches. are there any exceptions? Can you also use the execute watch API on one of your watches and paste the output in pastebin/gist?

Thanks!

--Alex

Alex,

I can run the watch from the API

POST _watcher/watch/Nightly_Account_Change_Report/_execute
{
  "trigger_data" : {
    "scheduled_time": "now"
  },
  "action_modes" : {
    "email_admin":"execute"
  },
  "record_execution" : true
}

I can also use the Watcher tools in Kibana to Simulate execution of the watch and it performs as expected. Here's the relevant chunk of the Kibana log - it looks like it's generating the reports from the watches correctly:

{"type":"response","@timestamp":"2019-04-18T09:02:54Z","tags":["api"],"pid":6756,"method":"post","statusCode":200,"req":{"url":"/api/reporting/generate/csv?jobParams=(conflictedTypesFields%3A!()%2Cfields%3A!('%40timestamp'%2Cbeat.name%2Cevent_id%2Ctask%2Cevent_data.SubjectUserName%2Cevent_data.TargetUserName%2Cmessage)%2CindexPatternId%3A'winlogbeat-*'%2CmetaFields%3A!(_source%2C_id%2C_type%2C_index%2C_score)%2CsearchRequest%3A(body%3A(_source%3A(excludes%3A!()%2Cincludes%3A!('%40timestamp'%2Cbeat.name%2Cevent_id%2Ctask%2Cevent_data.SubjectUserName%2Cevent_data.TargetUserName%2Cmessage))%2Cdocvalue_fields%3A!('%40timestamp')%2Cquery%3A(bool%3A(filter%3A!()%2Cmust%3A!((query_string%3A(analyze_wildcard%3A!t%2Cdefault_field%3A'*'%2Cquery%3A'event_id%3A4720%20OR%20event_id%3A4722%20OR%20event_id%3A4723%20OR%20event_id%3A4724%20OR%20event_id%3A4725%20OR%20event_id%3A4726%20OR%20event_id%3A4727%20OR%20event_id%3A4728%20OR%20event_id%3A4729%20OR%20event_id%3A4730%20OR%20event_id%3A4731%20OR%20event_id%3A4732%20OR%20event_id%3A4733%20OR%20event_id%3A4734%20OR%20event_id%3A4735%20OR%20event_id%3A4738%20OR%20event_id%3A4737%20OR%20event_id%3A4740%20OR%20event_id%3A4741%20OR%20event_id%3A4743%20OR%20event_id%3A4744%20OR%20event_id%3A4745%20OR%20event_id%3A4746%20OR%20event_id%3A4747%20OR%20event_id%3A4748%20OR%20event_id%3A4749%20OR%20event_id%3A4750%20OR%20event_id%3A4751%20OR%20event_id%3A4752%20OR%20event_id%3A4753%20OR%20event_id%3A4754%20OR%20event_id%3A4755%20OR%20event_id%3A4756%20OR%20event_id%3A4757%20OR%20event_id%3A4758%20OR%20event_id%3A4759%20OR%20event_id%3A4760%20OR%20event_id%3A4761%20OR%20event_id%3A4762%20OR%20event_id%3A4763%20OR%20event_id%3A4764%20OR%20event_id%3A4767%20OR%20event_id%3A4781'))%2C(range%3A('%40timestamp'%3A(format%3Abasic_date%2Cgt%3A'now-1d'%2Clt%3A'now%2Fd'))))%2Cmust_not%3A!()%2Cshould%3A!()))%2Cscript_fields%3A()%2Csort%3A!(('%40timestamp'%3A(order%3Aasc%2Cunmapped_type%3Aboolean)))%2Cstored_fields%3A!('%40timestamp'%2Cbeat.name%2Cevent_id%2Ctask%2Cevent_data.SubjectUserName%2Cevent_data.TargetUserName%2Cmessage)%2Cversion%3A!t)%2Cindex%3A'winlogbeat-*')%2Ctitle%3A'Account%20%26%20Group%20Changes%20--%20AD%20%26%20Member%20Server--Report%20Output%20-%20(adj%20for%20ES%206)'%2Ctype%3Asearch)","method":"post","headers":{"accept-charset":"UTF-8","kbn-xsrf":"reporting","content-length":"0","host":"wines5-infr:5601","connection":"Keep-Alive","user-agent":"Apache-HttpClient/4.5.2 (Java/1.8.0_201)","accept-encoding":"gzip,deflate"},"remoteAddress":"10.1.45.32","userAgent":"10.1.45.32"},"res":{"statusCode":200,"responseTime":625,"contentLength":9},"message":"POST /api/reporting/generate/csv?jobParams=(conflictedTypesFields%3A!()%2Cfields%3A!('%40timestamp'%2Cbeat.name%2Cevent_id%2Ctask%2Cevent_data.SubjectUserName%2Cevent_data.TargetUserName%2Cmessage)%2CindexPatternId%3A'winlogbeat-*'%2CmetaFields%3A!(_source%2C_id%2C_type%2C_index%2C_score)%2CsearchRequest%3A(body%3A(_source%3A(excludes%3A!()%2Cincludes%3A!('%40timestamp'%2Cbeat.name%2Cevent_id%2Ctask%2Cevent_data.SubjectUserName%2Cevent_data.TargetUserName%2Cmessage))%2Cdocvalue_fields%3A!('%40timestamp')%2Cquery%3A(bool%3A(filter%3A!()%2Cmust%3A!((query_string%3A(analyze_wildcard%3A!t%2Cdefault_field%3A'*'%2Cquery%3A'event_id%3A4720%20OR%20event_id%3A4722%20OR%20event_id%3A4723%20OR%20event_id%3A4724%20OR%20event_id%3A4725%20OR%20event_id%3A4726%20OR%20event_id%3A4727%20OR%20event_id%3A4728%20OR%20event_id%3A4729%20OR%20event_id%3A4730%20OR%20event_id%3A4731%20OR%20event_id%3A4732%20OR%20event_id%3A4733%20OR%20event_id%3A4734%20OR%20event_id%3A4735%20OR%20event_id%3A4738%20OR%20event_id%3A4737%20OR%20event_id%3A4740%20OR%20event_id%3A4741%20OR%20event_id%3A4743%20OR%20event_id%3A4744%20OR%20event_id%3A4745%20OR%20event_id%3A4746%20OR%20event_id%3A4747%20OR%20event_id%3A4748%20OR%20event_id%3A4749%20OR%20event_id%3A4750%20OR%20event_id%3A4751%20OR%20event_id%3A4752%20OR%20event_id%3A4753%20OR%20event_id%3A4754%20OR%20event_id%3A4755%20OR%20event_id%3A4756%20OR%20event_id%3A4757%20OR%20event_id%3A4758%20OR%20event_id%3A4759%20OR%20event_id%3A4760%20OR%20event_id%3A4761%20OR%20event_id%3A4762%20OR%20event_id%3A4763%20OR%20event_id%3A4764%20OR%20event_id%3A4767%20OR%20event_id%3A4781'))%2C(range%3A('%40timestamp'%3A(format%3Abasic_date%2Cgt%3A'now-1d'%2Clt%3A'now%2Fd'))))%2Cmust_not%3A!()%2Cshould%3A!()))%2Cscript_fields%3A()%2Csort%3A!(('%40timestamp'%3A(order%3Aasc%2Cunmapped_type%3Aboolean)))%2Cstored_fields%3A!('%40timestamp'%2Cbeat.name%2Cevent_id%2Ctask%2Cevent_data.SubjectUserName%2Cevent_data.TargetUserName%2Cmessage)%2Cversion%3A!t)%2Cindex%3A'winlogbeat-*')%2Ctitle%3A'Account%20%26%20Group%20Changes%20--%20AD%20%26%20Member%20Server--Report%20Output%20-%20(adj%20for%20ES%206)'%2Ctype%3Asearch) 200 625ms - 9.0B"}
{"type":"response","@timestamp":"2019-04-18T09:03:10Z","tags":["api"],"pid":6756,"method":"get","statusCode":200,"req":{"url":"/api/reporting/jobs/download/jumf2xj2057o8aef8e5u4qdm","method":"get","headers":{"accept-charset":"UTF-8","kbn-xsrf":"reporting","content-length":"0","host":"wines5-infr:5601","connection":"Keep-Alive","user-agent":"Apache-HttpClient/4.5.2 (Java/1.8.0_201)","accept-encoding":"gzip,deflate"},"remoteAddress":"10.1.45.32","userAgent":"10.1.45.32"},"res":{"statusCode":200,"responseTime":18,"contentLength":9},"message":"GET /api/reporting/jobs/download/jumf2xj2057o8aef8e5u4qdm 200 18ms - 9.0B"}

so everything is working now or not? If not, please provide the information that was asked for. Thanks a lot!

Sorry for the confusion - no email is generated, and the Watcher status page in Kibana shows "Error." However, if I use the Simulate tool to Execute the watch, it works correctly and generates the expected emails.

please share the watch history output of that watch

GET .watcher-history-*/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "watch_id": "YOUR_WATCH_ID_HERE"
          }
        }
      ]
    }
  },
  "sort": [
    {
      "trigger_event.triggered_time": {
        "order": "desc"
      }
    }
  ]
}

Thanks!

It looks like this is the relevant warning, condition and execution are marked as success :

    "actions" : [
      {
        "id" : "email_admin",
        "type" : "email",
        "status" : "failure",
        "error" : {
          "root_cause" : [
            {
              "type" : "messaging_exception",
              "reason" : "failed to send email with subject [Sudo & Root Login Use -- Nightly Log] via account [smtp_account]"
            }
          ],
          "type" : "messaging_exception",
          "reason" : "failed to send email with subject [Sudo & Root Login Use -- Nightly Log] via account [smtp_account]",
          "caused_by" : {
            "type" : "mail_connect_exception",
            "reason" : "Couldn't connect to host, port: smtp.xxxxxxxxxxx.com, 25; timeout 120000",
            "caused_by" : {
              "type" : "connect_exception",
              "reason" : "Connection timed out: connect"
            }
          }

I'm confused as to why email is failing from the watcher configuration, but it works correctly if I "Simulate the watch" from Kibana.

A few potential reasons here.

First the simulation of a watch does not sent a real email, it just checks if the action would be executed successfully.

Second, even if you use the execute watch API without simulation and thus the email gets sent, there is still a difference. Using the execute watch API, the watch gets executed on the node that kibana connects. If that node due to whatever reasons is able to connect to the SMTP server, sending an email will be successful. However if on regular execution, the host that executes the watch is not able to connect to the SMTP you will still get an error.

So, one of the first steps would be to run the execute watch API for your watch and paste the output here. The next step (this is what the error message looks like to me) is to find out why an elasticsearch node is not able to connect to that SMTP server. Logging into that server and just trying telnet to that SMTP server and port might be a first step.

--Alex

1 Like

Thanks for the help, Alex -

Our network head is going to be fixing access to the SMTP server this weekend, so I should be able to verify that the email reports are working again after that work is completed. I'll update / close out this case when I have those results.

1 Like

Watches are now working consistently now that the internal network rules have been fixed. Thanks for the help.

1 Like

Hi @spinscale

I'm facing the same issue with following error.
"type": "messaging_exception",
"reason": "failed to send email with subject [[OSS-SOC] AD Anomaly Authentication Request Alert] via account [outlook_account]"
}
],
"type": "messaging_exception",
"reason": "failed to send email with subject [[OSS-SOC] AD Anomaly Authentication Request Alert] via account [outlook_account]",
"caused_by": {
"type": "mail_connect_exception",
"reason": "Couldn't connect to host, port: outlook.office365.com, 587; timeout 120000",
"caused_by": {
"type": "socket_timeout_exception",
"reason": "connect timed out"

Below is the output after executing watch API.
{
"statusCode": 504,
"error": "Gateway Time-out",
"message": "Client request timeout"
}

Please help.

Jeeth,

it looks like you're having a similar problem - your Kibana server can't connect to office365. Based on the error message, you may need to check 1) are you using a valid account that has access to O365 for email (not a 'spoofed' email address)? 2) Is there a validation / verification process that you need to complete for this account? the error implies that the account may being blocked by "anti-spoofing / anti-spam" automation.

Hey @ccampbell,

Thanks for your prompt response. I have valid office365 account and have updated it's configuration in elasticsearch.yml file and it triggered alerts earlier. I'm facing this issue from last week and couldn't able to figure out why.

If Watcher worked previously, but you're getting the "AD Anomaly Authentication Request" error now, watcher alerts may have triggered some sort of watchdog in Office365. Do you have an administrator there who can check the error and verify that your service account still has rights to send email?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.