Alert when okay

Hi @acchaulk,

Watcher can absolutely do this because it is just a matter of defining a condition to trigger on.

If the condition passes, such as when an "error" state is detected, then you can follow up by performing any action that is supported by Watcher.

Therefore, the issue that you may be having is how to define the right condition and there are a lot of different strategies that you can employ, depending on the complexity that you are willing to endure:

  • Create two Watches
    1. The first Watch triggers on detecting the "Okay -> Error" transition.
    2. The second Watch triggers on detecting the "Error -> Okay" transition.
    • This is generally the simplest approach.
  • Create a Watch with a chain input
    • Detect both scenarios in the same Watch using the separate inputs.
    • Use a script condition and most likely a script transform to compare the separate responses and determine if this is worth alerting against.
    • Report the transition in the action(s).
  • Create a Watch with a chain input, but store the state somewhere (or read it from .watcher-history-*)
    • This is effectively the same thing as the second one, but it allows lapses in running the actual Watch because you can remember the previous state rather than hoping to catch it in your current request.
      • This complicates the overall Watch, but it simplifies its behavior.
    • If you are not going to read the previous state from .watcher-history-*, then you need to create your own "state" index for remembering the last run(s).

In practice, I find myself starting with the first option and quickly building my way into the second option. Frequently I just stop there, but I sometimes find myself wanting a stateful safety-net, which is the third and final option.

As the author of most of the cluster alerts, you will find if you look at them that they are all the third option. At Elastic{ON} 2017, I noted that they are just Watches under the cover and you can actually look at them if you check out the .watches index. For example, here's the cluster status for my local cluster running 5.6, which I happened to have running locally (I chopped out the status portion, which is metadata that Watcher itself uses and shuffled it around to be the more traditional JSON order):

... response to follow in response because of character limit ...

3 Likes