Hi,
I'm trying to build an alert that goes off when an instance of a service skipped 10 heartbeats within the last minute. I'm using the following adapted version of what I found in another post.
I am aggregating using a script, as there's two instances per monitor.name
(service name) which can be separated by including observer.geo.name
. Once the alert goes off, I receive a Slack message.
{
"trigger": {
"schedule": {
"interval": "1m"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"heartbeat-*"
],
"rest_total_hits_as_int": true,
"body": {
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"monitor.status": {
"query": "down"
}
}
}
],
"filter": [
{
"range": {
"@timestamp": {
"from": "now-1m"
}
}
}
]
}
},
"aggregations": {
"monitor_and_geo_names": {
"terms": {
"script": "doc['monitor.name'].value + ' at ' + doc['observer.geo.name'].value",
"min_doc_count": 10,
"order": {
"_key": "asc"
}
}
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 0
}
}
},
"actions": {
"notify-slack": {
"throttle_period_in_millis": 600000,
"slack": {
"account": "monitoring",
"message": {
"from": "Health Alert",
"text": "Missing heartbeats for following host(s):",
"dynamic_attachments": {
"list_path": "ctx.payload.aggregations.monitor_and_geo_names.buckets",
"attachment_template": {
"color": "warning",
"title": "{{key}}",
"text": "Missing heartbeats (last minute): {{doc_count}}"
}
}
}
}
}
}
}
The problem is that the alert is triggered, even though the aggregation did not yield any buckets (e.g. an instance skipped just a single heartbeat in the last minute). The fact that I don't receive any attachements to my Slack message indicates, that there are no buckets. Why is the alert then triggered anyway? Am I missunderstanding how min_doc_count
in the aggregation and ctx.payload.hits.total
in my compare condition work together?