Trying to understand the throttle and acknowledge features, as part of evaluating whether Watcher will meet our use cases. This is a paper study so far - I haven't tried actually running Watcher yet for commercial and legal reasons.
(1) Suppose we're processing log output through Logstash which tells us amongst other things whether any of our widgets are broken. (We may have lots of widgets so creating a separate watch for each one doesn't sound like a good idea.) We want to be able to
- raise some sort of alert when a widget is reported as being broken
- raise a separate alert for each widget that becomes broken
- refrain from raising more than one alert for each broken widget until we spot from the logs that it has become mended again, after which another log message saying the same widget has broken again starts a new lifecycle.
I can't see how either throttling or acknowledgements help here, as we want the suppression of duplicate "it's broken" reports to apply per widget, not to the entire watch.
(2) I note the ability to output to a JIRA ticket, but don't see how to manage the ticket life cycle.
- When widget 47 breaks I want to create a ticket saying "widget 47 is now in broken state"
- Further runs of the watch, which may re-scan the same log document, should not raise further tickets (ie the existence of an open ticket for widget 47 should prevent the creation of a new one)
- If widget 47 spontaneously mends itself, as per a new log document, the ticket should be closed.
- If widget 47 should then break again a new ticket should be opened.
- Ticket lifecycles for each widget must be independent.
(3) I could write my own web application to understand and manage tickets and lifecycles, and use Watcher to invoke this, but
- that's potentially rather a lot of code to write
- what value would Watcher then be adding, over my code just querying Elasticsearch itself?
- and the performance implications don't sound very clever.
So, how should I be looking at this?