Watching watcher

jbaranick · May 20, 2015, 5:00pm

It would be nice to be able to configure watches to alert when the queries fail to return within a configured timeout. The default timeout could be the schedule duration. This protects against long running queries causing watches to not fire.

skearns · May 21, 2015, 9:16am

This is a good question - I would love to learn more about your goal here. I imagine a few reasons:

Get notified when any configured watches exceed their configured timeouts
Ensure consistent watch execution times for short-interval watches

Are there other goals you had in mind?

Today, you can specify a timeout in the search input request body, and we do record the search execution information (e.g. execution_result.input.search.payload.took, and timed_out) in the watch history. These fields aren't indexed today, but it's something we could consider adding, so you could create a watch that looks at the watch history for timed_out ES queries.

jbaranick · May 21, 2015, 1:31pm

If we are relying on watches to alert us when there is a production issue, then the watches are a critical piece of the infrastructure. As such, if watches stop running (or start timing out) it is a production live site which needs to be immediately addressed. This means we need to be alerted about watches which timeout, fail, or fail to run. Ideally, this notification would be resistant to elastricsearch cluster issues (red, lots of GC-ing, etc.).

skearns · June 1, 2015, 9:00pm

Joel,

Coming back around to this. I now see what you're after, and it makes a lot of sense.

Many of our customers use Marvel for monitoring Elasticsearch - it records metrics and telemetry from Elasticsearch over time. For larger clusters, we recommend storing the Marvel data in a separate monitoring cluster. In much the same way, you can run Watcher on a monitoring cluster and simply query your production cluster using the HTTP input:
https://www.elastic.co/guide/en/watcher/current/anatomy-input.html#anatomy-input-http

We expected that monitoring Elasticsearch itself would be a common use-case, so we have provided a few examples of watches based on Marvel data:
https://www.elastic.co/guide/en/watcher/current/watching-marvel-data.html

Hope that helps!

Topic		Replies	Views
Alerting on crushed cluster Elasticsearch elastic-stack-alerting	4	734	July 6, 2017
Configure scheduled watcher against monitoring indices which only looks at the latest X minutes of data Elasticsearch elastic-stack-alerting	7	741	October 16, 2018
Specify timeout per watch Elasticsearch elastic-stack-alerting	6	1478	August 4, 2017
Watcher "Failed to execute" error Elasticsearch	2	1648	December 19, 2017
Watcher fails when an external webservice is unreachable Elasticsearch elastic-stack-alerting	3	865	August 22, 2018

Watching watcher

Related topics