I am working creating alerts in elastic. I have a complex requirement of checking failures every hour and then summing them up to alert when the sum is exceeding 10 and then checking every hour and doing the same. I am also trying to delete the history after it reports the first alert.
first, please take the time to properly format your code snippets using markdown, this is pretty much unreadable for humans (which are not good JSON parsers in the first place).
Can you explain why your current approach is not sufficient and what you would like to improve, as noone knows your use-case better than you, every hint would just be some guesswork.
Schedule a check for the number of failures every hour and then generate an alert if the sum of the failures of the hours checked is more than 10. Refresh the check after every alert and follow the same again until you get to generate another alert. These alerts should be generated from (7 am - 5pm) PST.
Hour
FailureCount
7-8
4
8-9
2
9-10
4
Total
10
It should alert when the sum is 10 or more and then start checking freshly from next hour(10-11) and do the same.
So, watcher should trigger every hour but will send notification only if above condition satisfies.
I do no understand why you need to divide by hour, if you are using a sum to decide whether you should trigger an alert.
Also, can you be more exact what means refresh in your context? If you mean to ignore some previous results, than just execute a query that allows you to inspect the results from the previous run and factor this into your decision (the condition) to decide if the action should be triggered.
This way you could prevent the repeated execution that you want do not want to do it seems.
Another way would be to check the watcher history for previous runs, however I think checking your previous data is easier from my current point of view.
@spinscale. Okay let us say we are not dividing by hour. But what would be the approach if we are trying to add the errors and then alert after the sum is 10. How far should we go back is the issue I am facing. I know that I am asking you too many questions. But the ultimate goal is to report if the number of errors are more than 10 and after that it should restart counting by excepting the previous 10.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.