For simple monitoring of our webservices, I'm creating some watchers based on response codes.
However, when a service is unreachable (not something like response 404 not found, but an offline server) watcher execution fails.
you could wrap the http input within a chain input, which catches all exceptions, but then you need to check the payload yourself. Also the exception is not properly available.
From my watcher/elasticsearch perspective I would use a dedicated system for monitoring, which inserts data into elasticsearch. Watcher then only queries Elasticsearch. This has a few advantages. First, you are decoupling information collection and alerting, which is important when you add more alerts/endpoints. You also dont have to worry about watches being stuck when trying to connect to endpoint, which potentially prevent other watches from executing, as they are blocking a threadpool.
The Elastic Stack already allows you do to exactly this. You can use heartbeat for the heavy lifting of connecting to other services, managing timeouts and then have the result indexed into Elasticsearch. Heartbeat supports ICMP, TCP, HTTP checks, which should be sufficient in your use-case.
Thank you for the help. I'll test with the chain input since it's just a simple health check. Further diagnostics will be done with heartbeats in the future.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.