Watcher fails when an external webservice is unreachable

JeffreyH1989 · July 25, 2018, 7:26am

Hello,

For simple monitoring of our webservices, I'm creating some watchers based on response codes.
However, when a service is unreachable (not something like response 404 not found, but an offline server) watcher execution fails.

    "type": "connect_timeout_exception",
    "reason": "Connect to url:80 [url/ip] failed: connect timed out",
    "caused_by": {
      "type": "socket_timeout_exception",
      "reason": "connect timed out"

I tried setting the condition as follows:
"condition": {
"compare": {
"ctx.input.status": {
"eq": "failure"
}
}

Still the entire watcher fails and no actions were executed.
A simple example without any conditions fails aswell:
{
"trigger": {
"schedule": {
"interval": "5m"
}
},
"input": {
"http": {
"request": {
"scheme": "http",
"host": "webservice:80",
"port": 80,
"method": "get",
"params": {},
"headers": {}
}
}
}
}

Are there any options for catching these exceptions and log them accordingly?

Thank you in advanced

spinscale · July 25, 2018, 7:39am

you could wrap the http input within a chain input, which catches all exceptions, but then you need to check the payload yourself. Also the exception is not properly available.

From my watcher/elasticsearch perspective I would use a dedicated system for monitoring, which inserts data into elasticsearch. Watcher then only queries Elasticsearch. This has a few advantages. First, you are decoupling information collection and alerting, which is important when you add more alerts/endpoints. You also dont have to worry about watches being stuck when trying to connect to endpoint, which potentially prevent other watches from executing, as they are blocking a threadpool.

The Elastic Stack already allows you do to exactly this. You can use heartbeat for the heavy lifting of connecting to other services, managing timeouts and then have the result indexed into Elasticsearch. Heartbeat supports ICMP, TCP, HTTP checks, which should be sufficient in your use-case.

Hope this helps!

--Alex

JeffreyH1989 · July 25, 2018, 8:25am

Thank you for the help. I'll test with the chain input since it's just a simple health check. Further diagnostics will be done with heartbeats in the future.

system · August 22, 2018, 8:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Watch is stuck at error, always needs a manual restart Elasticsearch elastic-stack-alerting	5	1343	October 26, 2020
Watcher is throwing timeout_exception Kibana elastic-stack-monitoring , elastic-stack-alerting	3	1359	August 10, 2021
Elasticsearch Watcher error while trying to send email attachment, dashboard.pdf Elasticsearch elastic-stack-alerting , painless	14	1476	June 9, 2022
Evaluate condition when HTTP input fails Elasticsearch elastic-stack-alerting	2	931	March 1, 2017
ES Watcher Action failed to execute Kibana	2	549	January 16, 2020

Watcher fails when an external webservice is unreachable

Related topics