Watcher configuration to auto resolve the pagerduty alerts if service is back online

I have configured watcher to trigger alerts in pagerduty. If a service comes back online automatically,I would expect the pagerduty to resolve the alert. Can someone help me how we can achieve that

Hello @SowmiyaRS ,

I didn't understood your issue entirely. Assuming your intention is to monitor service through watcher and having field in indices like service.status:UP or DOWN this can be done.
Below is my script how I'm monitoring tomcat down status and index I have field: tomcat.serverStatus: UP/DOWN coming every 20 seconds.
Not sure same is your intention.Just check the script to trigger according to your specific condition.

{
  "trigger": {
    "schedule": {
      "interval": "10s"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "mis-monitoring-webserver-*"
        ],
        "rest_total_hits_as_int": true,
        "body": {
          "aggs": {
            "host": {
              "terms": {
                "field": "host.name.keyword",
                "order": {
                  "_key": "desc"
                },
                "size": 10000
              },
              "aggs": {
                "port": {
                  "terms": {
                    "field": "tomcat.port",
                    "order": {
                      "_key": "desc"
                    },
                    "size": 10000
                  },
                  "aggs": {
                    "application_name": {
                      "terms": {
                        "field": "tomcat.application_name",
                        "order": {
                          "_key": "desc"
                        },
                        "size": 10000
                      },
                      "aggs": {
                        "status": {
                          "top_hits": {
                            "fields": [
                              {
                                "field": "tomcat.server_status"
                              }
                            ],
                            "_source": false,
                            "size": 1,
                            "sort": [
                              {
                                "@timestamp": {
                                  "order": "desc"
                                }
                              }
                            ]
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          },
          "size": 0,
          "query": {
            "bool": {
              "filter": [
                {
                  "script": {
                    "script": """if (doc['tomcat.server_status.keyword'].size() != 0 )
              {
                     def a=doc['tomcat.server_status.keyword'].value;
                     if (a=='DOWN')
                     {
                             return true;
                     }
                     else
                     {
                        return false;
                     }
}
 """
                  }
                },
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-15m",
                      "lte": "now"
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 0
      }
    }
  },
  "actions": {
    "email_1": {
      "email": {
        "profile": "standard",
        "to": [
          "prashant.mehta@xyz.com"
        ],
        "subject": "TOMCAT DOWN STATUS",
        "body": {
          "html": """
{{#ctx.payload.aggregations.host.buckets}}
<p><b>Summary:</b> Tomact is down on server {{key}}</p>
<p><b>Date and Time:</b> {{ctx.trigger.scheduled_time}}</p>
<p><b>Description:</b>Tomcat is down with the following details</p>
{{#port.buckets}}
- Port: {{key}}, Applications: {{#application_name.buckets}}{{key}}<br>

<p><b>Status:</b> DOWN</p>
{{/application_name.buckets}}{{/port.buckets}}
<p><b>Issued By:</b> CIS Monitoring System</p>
<hr />
{{/ctx.payload.aggregations.host.buckets}}
"""
        }
      }
    }
  }
}

My requirement is if the pagerduty alert gets triggered when the service got crashed for a minute and if the service got restored on its own in the next minute,watcher should resolve the pagerduty alert that was created

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.