Using Watcher to drive ticket life cycles?

TimWard · July 13, 2017, 3:00pm

Trying to understand the throttle and acknowledge features, as part of evaluating whether Watcher will meet our use cases. This is a paper study so far - I haven't tried actually running Watcher yet for commercial and legal reasons.

(1) Suppose we're processing log output through Logstash which tells us amongst other things whether any of our widgets are broken. (We may have lots of widgets so creating a separate watch for each one doesn't sound like a good idea.) We want to be able to

raise some sort of alert when a widget is reported as being broken
raise a separate alert for each widget that becomes broken
refrain from raising more than one alert for each broken widget until we spot from the logs that it has become mended again, after which another log message saying the same widget has broken again starts a new lifecycle.

I can't see how either throttling or acknowledgements help here, as we want the suppression of duplicate "it's broken" reports to apply per widget, not to the entire watch.

(2) I note the ability to output to a JIRA ticket, but don't see how to manage the ticket life cycle.

When widget 47 breaks I want to create a ticket saying "widget 47 is now in broken state"
Further runs of the watch, which may re-scan the same log document, should not raise further tickets (ie the existence of an open ticket for widget 47 should prevent the creation of a new one)
If widget 47 spontaneously mends itself, as per a new log document, the ticket should be closed.
If widget 47 should then break again a new ticket should be opened.
Ticket lifecycles for each widget must be independent.

(3) I could write my own web application to understand and manage tickets and lifecycles, and use Watcher to invoke this, but

that's potentially rather a lot of code to write
what value would Watcher then be adding, over my code just querying Elasticsearch itself?
and the performance implications don't sound very clever.

So, how should I be looking at this?

Dale_McDiarmid · July 18, 2017, 10:55pm

Hi Tim
Before i answer can i confirm if you expect to receive just one or multiple broken messages for the same widget before receiving a "mended" message?
thanks

TimWard · July 19, 2017, 8:05am

Typically two, but not always.

I'd already got some code to create "entity-centric" documents for each period of brokenness, which takes account of the multiple broken/mended messages. And since writing the original post I've put a hook into that which creates and updates Trac tickets (just 'cos I had a testing instance of Trac handy), and one can get Trac to send out the alert emails, so that's (subject to Ops liking the result) apparently a solution to the problem which (a) doesn't involve that much code and (b) doesn't involve Watcher.

So I could rephrase my original enquiry perhaps along the lines of:

"What part could Watcher play, if any, in managing state based / level triggered (rather than event based / edge triggered) alarm conditions?"

If the answer is "none, this isn't a job for Watcher, you're doing it the right way" then fine, but it felt like the sort of thing that Watcher should be capable of helping with ... until I started trying to work out how to do it.

system · August 16, 2017, 8:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
+1 for automatic acknowledgement in watcher Elasticsearch elastic-stack-alerting	6	1633	June 30, 2017
Watcher alert only on state transition (error / ok state) Elasticsearch elastic-stack-alerting	4	1207	September 9, 2020
Different alerts for different hosts Elasticsearch elastic-stack-alerting	3	741	September 28, 2018
Watcher Notification and Ticketing Elasticsearch elastic-stack-alerting	3	837	January 6, 2017
Dynamic watcher/alerting without creating multiple watches Elasticsearch elastic-stack-alerting	0	30	April 23, 2025

Using Watcher to drive ticket life cycles?

Related topics