Hi @acchaulk,
Watcher can absolutely do this because it is just a matter of defining a condition
to trigger on.
If the condition
passes, such as when an "error" state is detected, then you can follow up by performing any action
that is supported by Watcher.
Therefore, the issue that you may be having is how to define the right condition
and there are a lot of different strategies that you can employ, depending on the complexity that you are willing to endure:
- Create two Watches
- The first Watch triggers on detecting the "Okay -> Error" transition.
- The second Watch triggers on detecting the "Error -> Okay" transition.
- This is generally the simplest approach.
- Create a Watch with a
chain
input
- Detect both scenarios in the same Watch using the separate inputs.
- Use a
script
condition
and most likely ascript
transform
to compare the separate responses and determine if this is worth alerting against. - Report the transition in the
action
(s).
- Create a Watch with a
chain
input
, but store the state somewhere (or read it from.watcher-history-*
)- This is effectively the same thing as the second one, but it allows lapses in running the actual Watch because you can remember the previous state rather than hoping to catch it in your current request.
- This complicates the overall Watch, but it simplifies its behavior.
- If you are not going to read the previous state from
.watcher-history-*
, then you need to create your own "state" index for remembering the last run(s).
- This is effectively the same thing as the second one, but it allows lapses in running the actual Watch because you can remember the previous state rather than hoping to catch it in your current request.
In practice, I find myself starting with the first option and quickly building my way into the second option. Frequently I just stop there, but I sometimes find myself wanting a stateful safety-net, which is the third and final option.
As the author of most of the cluster alerts, you will find if you look at them that they are all the third option. At Elastic{ON} 2017, I noted that they are just Watches under the cover and you can actually look at them if you check out the .watches
index. For example, here's the cluster status for my local cluster running 5.6, which I happened to have running locally (I chopped out the status
portion, which is metadata that Watcher itself uses and shuffled it around to be the more traditional JSON order):
... response to follow in response because of character limit ...