Some of built-in watches fails

Hi,

I have a situation where two of the built-in watches fails:
Cluster status and Logstash version mismatch. Both Watcher in Kibana and Elasticsearch log file just states that it failed.

I have also another issue with one of my own watches. I wanted to test out email action, so I created a dummy ML job and ticked the "send email" checkbox, copied the email config from that watch into another ML job. Now it just says that the email action Execution Failing, but there is no errors anywhere.

Any pointers where to start?

Hey

Can you share the log file and the corresponding watch history entries please, so we can take a deeper look why those actually failed?

Thank you!

--Alex

Hi!

Thank you for replying.

I have a feeling that this is related to email configuration, because it started right after I enabled it. But let's break it down:

Kibana UI looks like this:

I have grepped through all the logs in all my 3 ES nodes.
They have multiple events like this every day:

[2018-01-25T11:38:03,082][WARN ][o.e.x.w.e.ExecutionService] [analyzer03] failed to execute watch [roClcHctTIWj8HFwteQHQw_elasticsearch_cluster_status]
[2018-01-25T11:40:01,045][WARN ][o.e.x.w.e.ExecutionService] [analyzer03] failed to execute watch [roClcHctTIWj8HFwteQHQw_elasticsearch_cluster_status]
[2018-01-25T11:50:59,713][WARN ][o.e.x.w.e.ExecutionService] [analyzer03] failed to execute watch [roClcHctTIWj8HFwteQHQw_elasticsearch_cluster_status]
[2018-01-25T12:59:59,605][WARN ][o.e.x.w.e.ExecutionService] [analyzer03] failed to execute watch [roClcHctTIWj8HFwteQHQw_elasticsearch_cluster_status]
[2018-01-25T13:24:58,094][WARN ][o.e.x.w.e.ExecutionService] [analyzer03] failed to execute watch [roClcHctTIWj8HFwteQHQw_kibana_version_mismatch]
[2018-01-25T15:57:58,422][WARN ][o.e.x.w.e.ExecutionService] [analyzer03] failed to execute watch [roClcHctTIWj8HFwteQHQw_kibana_version_mismatch]
[2018-01-25T17:07:58,112][WARN ][o.e.x.w.e.ExecutionService] [analyzer03] failed to execute watch [roClcHctTIWj8HFwteQHQw_kibana_version_mismatch]

The version mismatch is okay, it is 0.0.1 version back and forth, so it shouldn't really effect on this.

I think this is related:

/var/log/elasticsearch/analyzer-prod-2018-01-25.log.gz:[2018-01-25T13:09:18,286][ERROR][o.e.x.w.a.e.ExecutableEmailAction] [analyzer01] failed to execute action [roClcHctTIWj8HFwteQHQw_logstash_version_mismatch/send_email_to_admin]

/var/log/elasticsearch/analyzer-prod-2018-01-25.log.gz:javax.mail.MessagingException: failed to send email with subject [[RESOLVED] X-Pack Monitoring: Logstash Version Mismatch (roClcHctTIWj8HFwteQHQw)] via account [exchange_account]

Doesn't say why it fails.

If I query watcher-history from that same day, I found this:

GET .watcher-history-7-2018.01.25/_search
{
"query" : { "match" : { "watch_id": "roClcHctTIWj8HFwteQHQw_elasticsearch_cluster_status" }}
}

        "actions": {
          "send_email_to_admin": {
            "ack": {
              "timestamp": "2017-12-19T08:10:25.512Z",
              "state": "awaits_successful_execution"
            },
            "last_execution": {
              "timestamp": "2018-01-21T00:01:08.762Z",
              "successful": false,
              "reason": ""
            }

The reason is empty.
I browsed through the watch history and tried to find "error", "fail", "action", "email" but I couldn't find any error messages.

I have configured email settings like this in elasticsearch.yml (on all nodes):

  notification:
    email:
      html:
        sanitization:
          enabled: false
      account:
        exchange_account:
          profile: outlook
          email_defaults:
            from: analyzer@company.tld
          smtp:
            starttls.enable: true
            host: outlook.company.tld
            port: 25

If I try to connect to the mail server with cURL, it works from all the nodes:

[root@analyzer03 ~]# curl --ssl-reqd -v smtp://outlook.company.tld

  • About to connect() to outlook.company.tld port 25 (#0)
  • Trying XXX.XXX.XXX.XXX...
  • Connected to outlook.company.tld (XXX.XXX.XXX.XXX) port 25 (#0)
    < 220 exch2k16.intra.company.tld Microsoft ESMTP MAIL Service ready at Wed, 7 Feb 2018 12:27:16 +0200
    EHLO analyzer03
    < 250-exch2k16.intra.company.tld Hello [XXX.XXX.XXX.XXX]
    < 250-SIZE 36700160
    < 250-PIPELINING
    < 250-DSN
    < 250-ENHANCEDSTATUSCODES
    < 250-STARTTLS
    < 250-8BITMIME
    < 250-BINARYMIME
    < 250 CHUNKING
    STARTTLS
    < 220 2.0.0 SMTP server ready

Thanks!

Hey,

is there any possibility to see the full watch history entry? If it is too sensitive in public, you could also email it to me at $firstname.$lastname@elastic.co

Indeed it seems as if there are problems sending your email, let's try to get down to the reason.

--Alex

Hi!

Just dropped you an email with watcher history attached.

Cheers!

do you have any custom SSL setup?

Yes, the mail server uses a certificate signed by our internal CA.

Does Elasticsearch have access to this internal CA cert? As it is needed when connecting...

No, I have not configured it anywhere because I don't know how to.
So is there a setting where I could define a path, or should I add it to system-wide certificates path, or... ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.