A question on throttling the alerts

Hi,

Right now I have an alert configured which triggers every 5 mins. However it monitors if an error of a particular type has occured over last 24 hours.

It sends emails and in ideal case the throttle period will be 24 hours to avoid sending again the same email (provided no new errors have occured).
It seems I have to accept sending some emails which have same content (throttle period of say 1 hour) in order to have a system which is responsive enough to pick up new errors happening.

Is there a way around it?

I was thinking of storing the timestamp of last error in some persistent storage and reading it in painless script to calculate the time period to be given in the range part of query.

{
   "range": {
               "@timestamp": {
               "gte": "now-{{ctx.metadata.calculatedTimePeriod}}"
          }
    }
}

Given that I am using cloud instance and that I have not come across any example of reading file in painless script, are there any other ways to avoid sending duplicate emails?

The rules do not track what the last error was about to understand if the new error is the same as the last error.

One way to resolve this would be to have a flow where errors in the data get updated with an acknowledged=true flag, and the rule ignores errors that have been acknowledged.

Hi @pk.241011 I am not sure how complex your Alert Logic is... but the New Kibana Alerting Framework has concepts like Time Based Throttle and Alert only on Status changes perhaps that could help solve your issue.

Watcher still does some things that the Kibana Alerting Can Not see here.

However with a DSL Query Alert you may be able to achieve what you are looking for.

Notify

This value limits how often actions are repeated when an alert remains active across rule checks. See Create and manage rules for more information.

  • Only on status change : Actions are not repeated when an alert remains active across checks. Actions run only when the alert status changes.
  • Every time alert is active : Actions are repeated when an alert remains active across checks.
  • On a custom action interval : Actions are suppressed for the throttle interval, but repeat when an alert remains active across checks for a duration longer than the throttle interval.

Thanks for reply. A few questions.

I am creating the watch using the Create advanced watch functionality.
I am generating an email under the action section. I am looking if I can specify _ack action also under it. But so far I am seeing only an API which is invoked manually. Is the webhook action to be used to _ack

Will this work:

"actions" : {
  "ack_the_alert" : {
    "webhook" : {
      "method" : "PUT",
      "host" : "localhost",
      "port" : 9200,
      "path": ":/_watcher/watch/{{ctx.watch_id}}/_ack/email_action",
      "headers" : {
        "Content-Type" : "application/yaml" 
      },
      "body" : ""
    }
  }
}

And unless I understood it wrong my initial problem still remains.

As per documentation:
Acknowledging an action throttles further executions of that action until its ack.state is reset to awaits_successful_execution . This happens when the condition of the watch is not met (the condition evaluates to false ).

In my case I am looking back at 24 hours but my watcher is running every 5 mins.
If the condition say 5 failures or more becomes true then alert is generated and I ack it also.

As errors are not going anywhere the condition of watch will remain true for next 24 hours. And hence no alerts even if count of new failures exceeds 5.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.