Alert once if monitor is down or comes back up

Hey guys,

I am fairly new to this but I need some help with my watcher setup.

I have setup Heartbeat and I currently have 7 monitors.
i.e
monitor-01
monitor-02
etc.

I need help setting up my exact scenarios, I need help with 3 scenarios:

Scenario 1:
If monitor-01 goes offline, I want to send ONLY 1 email to "test@domain.com" with the body of: "Hello there, monitor-01 just went offline! Please check, thanks."

If monitor-02 goes offline, I want the exact same result as above.. I dont want multiple emails alerting me every second / minute if the monitor is down, I only want 1 email.

Scenario 2:
If monitor-01 or any my monitors are offline... Every 3 hours, I want a refresh email sent out (I would like the email body to contain how long the specific monitor is down for, i.e monitor down for 120hours 13 minutes). So, if 3 hours pass, I want to send an email to "test@domain.com" with the body of: "Hello there, this is a reminder email that monitor-01 is still offline! Please check, thanks."

Scenario 3:
If any of the monitors come back online, I want to send out an email to "test@domain.com" with the body of: "Hello there, great news! monitor-02 is back online. The monitor was down for 7hours 12 minutes. Thanks."

Can someone please assist? I looked everywhere and cannot find the correct syntax to create the above scenarios. These would be scenarios I feel could benefit other members of the community.

P.s, I currently have an advanced watch that I found in the forums but does not match my criteria. Here is the code for it:

    {
      "trigger": {
        "schedule": {
          "interval": "10s"
        }
      },
      "input": {
        "search": {
          "request": {
            "search_type": "query_then_fetch",
            "indices": [
              "heartbeat-*"
            ],
            "body": {
              "query": {
                "bool": {
                  "must": {
                    "match": {
                      "monitor.status": "down"
                    }
                  },
                  "filter": {
                    "range": {
                      "@timestamp": {
                        "from": "now-50s"
                      }
                    }
                  }
                }
              },
              "aggregations": {
                "by_monitors": {
                  "terms": {
                    "field": "monitor.id",
                    "size": 10,
                    "min_doc_count": 1
                  }
                }
              }
            }
          }
        }
      },
      "condition": {
        "compare": {
          "ctx.payload.hits.total": {
            "gt": 0
          }
        }
      },
      "actions": {
        "email_admin": {
          "email": {
            "profile": "standard",
            "from": "noreply@domain.com",
            "to": [
              "test@domain.com"
            ],
            "subject": "Monitor is DOWN: monitorname",
            "body": {
              "text": "Hello, there is a monitor offline currently. Please login to check."
            }
          }
        }
      }
    }

Thanks very much.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Thanks for the detailed post, we really appreciate this sort of feedback as it helps us create more accurate user stories as we define features, so big high five from us :slight_smile:

As you've discovered, this is somewhat tricky with watcher, I'm not sure we can accomplish all of these things with it.

The good news is that we're 1 step ahead of you. We're targeting a built in alerting feature using revamped Kibana alerting code for our upcoming 7.7 release. We're targeting everything but the recovery part, which is a work in progress. The entire process will be much easier and fully graphically driven via a few clicks in the Uptime UI.

With regard to your current watches, there are some ways perhaps to accomplish these goals, but the watch config gets very complex. IIRC I've seen recovery alerts done by querying for multiple successive checks, then using scripting to match a condition of down -> up. That said, if you can wait, we'll be building this into the app as a first class feature. It's high on our list.