Windows services up/down alerting using watcher

Hi,

can we create Metricbeat collected windows services up/down Alerting using watcher by per Host wise.
ex:-if Host A services abc is down it should alert that this services is down.

like for memory utilisation.

"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"metricbeat-6.0.0-*"
],
"types": [],
"body": {
"size": 0,
"query": {
"bool": {
"filter": {
"range": {
"@timestamp": {
"gte": "{{ctx.trigger.scheduled_time}}||-6m",
"lte": "{{ctx.trigger.scheduled_time}}",
"format": "strict_date_optional_time||epoch_millis"
}
}
}
}
},
"aggs": {
"bucketAgg": {
"terms": {
"field": "beat.hostname",
"size": 1000,
"order": {
"metricAgg": "desc"
}
},
"aggs": {
"metricAgg": {
"avg": {
"field": "system.memory.used.pct"
}
}
}
}
}
}
}
}
},
"condition": {
"script": {
"source": "ArrayList arr = ctx.payload.aggregations.bucketAgg.buckets; for (int i = 0; i < arr.length; i++) { if (arr[i]['metricAgg'].value > params.threshold) { return true; } } return false;",
"lang": "painless",
"params": {
"threshold": 0.9
}
}
},

please try to properly format your watches using markdown, as it makes it infinitely easier to read JSON.

The main question here is the question, what defines a service as 'down'. Is it the threshold you defined above or is it if down if there is no reporting coming in at all. If that's the case, you would need to compare the number of hosts that reporting from 10-20 minutes ago with the number of hosts that reporting from 0-10 minutes ago and check for their differences.

As usual, when writing a watch, the most important thing is not the watch itself, but coming up with a query that returns the data which you want to compare. Once you know that, you can go from there and mold that query into a full blown watch.

--Alex

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.