Using Watcher to drive ticket life cycles?

Trying to understand the throttle and acknowledge features, as part of evaluating whether Watcher will meet our use cases. This is a paper study so far - I haven't tried actually running Watcher yet for commercial and legal reasons.

(1) Suppose we're processing log output through Logstash which tells us amongst other things whether any of our widgets are broken. (We may have lots of widgets so creating a separate watch for each one doesn't sound like a good idea.) We want to be able to

  • raise some sort of alert when a widget is reported as being broken
  • raise a separate alert for each widget that becomes broken
  • refrain from raising more than one alert for each broken widget until we spot from the logs that it has become mended again, after which another log message saying the same widget has broken again starts a new lifecycle.

I can't see how either throttling or acknowledgements help here, as we want the suppression of duplicate "it's broken" reports to apply per widget, not to the entire watch.

(2) I note the ability to output to a JIRA ticket, but don't see how to manage the ticket life cycle.

  • When widget 47 breaks I want to create a ticket saying "widget 47 is now in broken state"
  • Further runs of the watch, which may re-scan the same log document, should not raise further tickets (ie the existence of an open ticket for widget 47 should prevent the creation of a new one)
  • If widget 47 spontaneously mends itself, as per a new log document, the ticket should be closed.
  • If widget 47 should then break again a new ticket should be opened.
  • Ticket lifecycles for each widget must be independent.

(3) I could write my own web application to understand and manage tickets and lifecycles, and use Watcher to invoke this, but

  • that's potentially rather a lot of code to write
  • what value would Watcher then be adding, over my code just querying Elasticsearch itself?
  • and the performance implications don't sound very clever.

So, how should I be looking at this?

Hi Tim
Before i answer can i confirm if you expect to receive just one or multiple broken messages for the same widget before receiving a "mended" message?

Typically two, but not always.

I'd already got some code to create "entity-centric" documents for each period of brokenness, which takes account of the multiple broken/mended messages. And since writing the original post I've put a hook into that which creates and updates Trac tickets (just 'cos I had a testing instance of Trac handy), and one can get Trac to send out the alert emails, so that's (subject to Ops liking the result) apparently a solution to the problem which (a) doesn't involve that much code and (b) doesn't involve Watcher.

So I could rephrase my original enquiry perhaps along the lines of:

"What part could Watcher play, if any, in managing state based / level triggered (rather than event based / edge triggered) alarm conditions?"

If the answer is "none, this isn't a job for Watcher, you're doing it the right way" then fine, but it felt like the sort of thing that Watcher should be capable of helping with ... until I started trying to work out how to do it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.