For simple monitoring of our webservices, I'm creating some watchers based on response codes.
However, when a service is unreachable (not something like response 404 not found, but an offline server) watcher execution fails.
"reason": "Connect to url:80 [url/ip] failed: connect timed out",
"reason": "connect timed out"
I tried setting the condition as follows:
Still the entire watcher fails and no actions were executed.
A simple example without any conditions fails aswell:
Are there any options for catching these exceptions and log them accordingly?
Thank you in advanced
you could wrap the http input within a
chain input, which catches all exceptions, but then you need to check the payload yourself. Also the exception is not properly available.
From my watcher/elasticsearch perspective I would use a dedicated system for monitoring, which inserts data into elasticsearch. Watcher then only queries Elasticsearch. This has a few advantages. First, you are decoupling information collection and alerting, which is important when you add more alerts/endpoints. You also dont have to worry about watches being stuck when trying to connect to endpoint, which potentially prevent other watches from executing, as they are blocking a threadpool.
The Elastic Stack already allows you do to exactly this. You can use heartbeat for the heavy lifting of connecting to other services, managing timeouts and then have the result indexed into Elasticsearch. Heartbeat supports ICMP, TCP, HTTP checks, which should be sufficient in your use-case.
Hope this helps!
Thank you for the help. I'll test with the chain input since it's just a simple health check. Further diagnostics will be done with heartbeats in the future.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.