Watcher Troubleshooting

vvwood · September 4, 2020, 7:22pm

Greetings all,
For this post, I am seeking advice on how to go about troubleshooting watcher configurations. I have 2 different use cases that I am seeking to use watcher for and so far I have run in to a few challenges that I am not sure how to deal with. My trouble is not with the syntax of the watcher config. I am able to review documentation and solve those issues so please do not focus on the syntax in the following examples.

As I iterate through watcher configs and attempt to use the Kibana UI or the watcher execute API, I am getting a very non descriptive error. " Cannot simulate watch - An Internal error occurred" or "{"statusCode":502,"error":"Bad Gateway","message":"Client request timeout"}". Both of theses are not much help. The question here is, Where should I look for more information as to why that is the result I am getting?
One watcher config that I created will trigger either via schedule or via execute API and enters the firing state but does not seem to be progressing. Any thoughts on how to troubleshoot this issue? ( I am including a sanitized snippet from the watcher status page below)

    {
  "watch_id": "WATCHID",
  "node": "WPv6z9KOSXaJIOQz80GfUg",
  "state": "executed",
  "user": "USERNAME",
  "status": {
    "state": {
      "active": true,
      "timestamp": "2020-09-04T18:15:08.840Z"
    },
    "last_checked": "2020-09-04T18:15:34.962Z",
    "last_met_condition": "2020-09-04T18:15:34.962Z",
    "actions": {
      "ACTIONID": {
        "ack": {
          "timestamp": "2020-09-04T18:15:34.962Z",
          "state": "ackable"
        },
        "last_execution": {
          "timestamp": "2020-09-04T18:15:34.962Z",
          "successful": true
        },
        "last_successful_execution": {
          "timestamp": "2020-09-04T18:15:34.962Z",
          "successful": true
        }
      }
    },
    "execution_state": "executed",
    "version": 1
  },
  "trigger_event": {
    "type": "manual",
    "triggered_time": "2020-09-04T18:15:34.962Z",
    "manual": {
      "schedule": {
        "scheduled_time": "2020-09-04T18:15:34.962Z"
      }
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "INDEXPATTERN"
        ],
        "rest_total_hits_as_int": true,
        "body": {
          "size": 0,
          "query": {
            "bool": {
              "filter": {
                "range": {
                  "@timestamp": {
                    "gt": "now-9h",
                    "lt": "now"
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gte": 1
      }
    }
  },
  "metadata": {
    "Dashboard_Link": "DASHBOARD URL"
  },
  "result": {
    "execution_time": "2020-09-04T18:15:34.962Z",
    "execution_duration": 300244,
    "input": {
      "type": "search",
      "status": "success",
      "payload": {
        "_shards": {
          "total": 2,
          "failed": 0,
          "successful": 2,
          "skipped": 0
        },
        "hits": {
          "hits": [],
          "total": 1,
          "max_score": null
        },
        "took": 1,
        "timed_out": false
      },
      "search": {
        "request": {
          "search_type": "query_then_fetch",
          "indices": [
            "INDEXPATTERN"
          ],
          "rest_total_hits_as_int": true,
          "body": {
            "size": 0,
            "query": {
              "bool": {
                "filter": {
                  "range": {
                    "@timestamp": {
                      "gt": "now-9h",
                      "lt": "now"
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "condition": {
      "type": "compare",
      "status": "success",
      "met": true,
      "compare": {
        "resolved_values": {
          "ctx.payload.hits.total": 1
        }
      }
    },
    "transform": {
      "type": "script",
      "status": "success",
      "payload": {
        "result": 1
      }
    },
    "actions": [
      {
        "id": "ACTIONID",
        "type": "email",
        "status": "success",
        "email": {
          "account": "MAILSYSTEM",
          "message": {
            "id": "WATCHERID",
            "from": "TESTINGEMAIL@DOMAIN.COM",
            "sent_date": "2020-09-04T18:20:35.142866Z",
            "to": [
              "TESTINGEMAIL@DOMAIN.COM"
            ],
            "subject": "SUBJECT",
            "body": {
              "text": "BODY TEXT, Dashboard can be viewed here: LINK TO RELATED DASHBOARD"
            }
          }
        }
      }
    ]
  },
  "messages": []
}

For those who see this shortly after posting and before the weekend, Have a good Labor day weekend.
-Vince

spinscale · September 16, 2020, 10:20am

Hey,

there is a longer blog post for watcher debugging at https://www.elastic.co/blog/watching-the-watches-writing-debugging-and-testing-watches that might help you.

Regarding your two cases:

Are you trying to create a report in that watch when this error message gets thrown?
Which state do you expect the watch to be in and why? The output looks as if an email has been sent, was this the indended result or not?

vvwood · September 16, 2020, 12:44pm

Alexander,
I have already read that article and did not find a solution within it.

The goal of this watch is to look for at least X occurrences of a certain log entry and then generate a PDF of a specific dashboard to send via email. I have this configuration working for a different data source so I know it is possible. I have received 0 email from this watcher config despite the execution states showing OK or Firing. This is NOT the intended result, just to be clear.

spinscale · September 17, 2020, 12:30pm

The email action stats success, so it has been delivered successfully to the SMTP server. Can you use a debugging SMTP server like maildev to check if that email is received? Do you have insight into that mailserver and can debug the issue further there?

vvwood · September 24, 2020, 8:37pm

The result of this thread was that I had mistakenly typed the wrong domain name in the email address. Watcher was doing its job; this was operator error.

system · October 22, 2020, 8:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Watcher Error Kibana elastic-stack-alerting	2	352	June 25, 2020
Watcher is throwing timeout_exception Kibana elastic-stack-monitoring , elastic-stack-alerting	3	1359	August 10, 2021
Watcher execution error Elasticsearch elastic-stack-monitoring , elastic-stack-security , elastic-stack-alerting	1	1475	October 16, 2020
Watcher execution error Elasticsearch elastic-stack-monitoring , elastic-stack-security , elastic-stack-alerting	2	676	November 13, 2020
Watcher issue (for real) Elasticsearch elastic-stack-alerting	7	883	July 6, 2017

Watcher Troubleshooting

Related topics