Delay watcher trigger time | Heartbeat

alerting

(MAX_JOHNSON) #1

Team, I have created a heartbeat watcher which is working fine, whenever a website goes down, it triggers an alert for it, but my query is, I have multiple website hosted in same server i.e 3 sites. Sometime 1 website take delay in loading and watcher assumes the website as "DOWN" and trigger the alert which offcourse is false alert as there is just delay of 10-15 sec for the website to load.

My question is, is there any way to delay the trigger of watcher, as it check the website performance for 10-15 sec and then trigger the alert, instead of just throwing the email alert that the site is down,

Below is the watcher which I configured, please have a look at it and please let me know what am I doing wrong.

{
 "trigger": {
 "schedule": {
  "interval": "10s"
 }
 },

"input": {
  "search": {
    "request": {
      "search_type": "query_then_fetch",
     "indices": [
       "heartbeat-wp-xyz-*"
      ],

    "types": [],
    "body": {
      "size": 0,
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "monitor.status": {
                  "value": "down"
                }
              }
            }
          ],
          "filter": [
            {
              "range": {
                "@timestamp": {
                  "from": "now-10s"
                }
              }
            }
          ]
        }
      },
      "aggregations": {
        "by_monitors": {
          "terms": {
            "field": "monitor.id",
            "size": 100,
            "min_doc_count": 1
          }
        }
      }
    }
  }
}
},

   "condition": {
     "compare": {
      "ctx.payload.hits.total": {
      "gt": 0
      }
    }
  },

     "actions": {
      "email_1": {
       "email": {
       "account": "gmail_account",
        "profile": "gmail",
        "to": [
      
      "abc@xyz.com"
       ],

     "subject": "Alert! SERVER IS DOWN !!",
     "body": {
      "html": "{{ctx.payload.hits.total}} Server has stopped working:<P> 
{{#ctx.payload.aggregations.by_monitors.buckets}}{{key}}<BR> 
{{/ctx.payload.aggregations.by_monitors.buckets}} \t \n\n at times: 
{{ctx.trigger.triggered_time}} \n\n"
     }
    }
  }
 }
}

(Alexander Reelsen) #2

hey,

how about having more than one aggregation that also checks for the previous status (by having an aggregation bucket for now-10s and one for now-1m and only alert if a host is in both buckets?

--Alex


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.