How to stop sending duplicate Slack notifications for the same error?

alerting

#1

I've asked this question in X-Pack +Heartbeat forums with no real progress made - I was referred to @michael.heldebrant for some assistance!?? If you have any insight on this it would be very much appreciated! :slight_smile:

Here's the details...

I have configured a heartbeat watcher to action a slack notification if the monitor status is down in the production environment and this is working fine buuuuuuuut - what I would like to do is NOT send duplicate notifications for the same error. Hence, is it possible to check the condition that triggers the action with the previous result and NOT send if equal??

Here is an example of a watcher:

{
  "trigger": {
    "schedule": {
      "interval": "1m"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "heartbeat*"
        ],
        "types": [],
        "body": {
          "query": {
            "bool": {
              "must": [
                {
                  "match": {
                    "_index": "heartbeat*"
                  }
                }
              ],
              "filter": [
                {
                  "term": {
                    "monitor.status": "down"
                  }
                },
                {
                  "term": {
                    "fields.environment": "Production"
                  }
                },
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-5m",
                      "lt": "now"
                    }
                  }
                }
              ]
            }
          },
          "aggs": {
            "unique_hosts": {
              "terms": {
                "field": "monitor.host"
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 0
      }
    }
  },
  "actions": {
    "notify-slack": {
      "throttle_period_in_millis": 1800000,
      "slack": {
        "message": {
          "to": [
            "#heartbeat-production"
          ],
          "text": "*SUMMARY:* Encountered {{ctx.payload.aggregations.unique_hosts.buckets.size}} unique hosts with status 'down' in the last 5 mins\n*ENVIRONMENT*: {{ctx.payload.hits.hits.0._source.fields.environment}}\n\n*URLs:*\n{{#ctx.payload.aggregations.unique_hosts.buckets}} Host Name: {{key}}\n{{/ctx.payload.aggregations.unique_hosts.buckets}}",
          "icon": "https://image.freepik.com/free-icon/letter-p_318-9235.jpg"
        }
      }
    }
  }
}

Optimize Action in Alerts to trigger only once for a detected anomaly
(Andrés Pérez) #2

Can you set a longer throttle_period?
I see it set to 30 minutes in your example, maybe a bigger value would be enough.

I think it's the most suitable mechanism for handling duplicate notifications.
Maybe we can suggest to the elastic devs to implement some kind of "forever" value for that parameter :wink:

If that were not possible, what comes to my mind would be much more cumbersome:

  • Add another search (so chain inputs) to obtain the results of the last execution (e.g. sort a small time range) recorded in the .watcher-history-... index each time the watcher is triggered.
  • Add comparison logic (so probably a script comparison) to take that values into account.

You can find related information in this question:


#3

Thanks for the response @andres-perez :muscle: Your first suggestion of a longer throttle period sounds like a reasonable quick fix though I may have a go at your second...a little hacky but if it works would be optimal...cheers! -D


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.