Hi,
I'm trying to create a watcher that can monitor service sshd.service status on multiple hosts and create aggregated log entry per host every five minutes. instead of creating log entry every time it detects "system.service.state" = "inactive".
The service status metrics are shipped to elastic by metricbeat agent.
{
"trigger": {
"schedule": {
"interval": "300s"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"metrics-*"
],
"rest_total_hits_as_int": true,
"body": {
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"match_phrase": {
"system.service.name": "{{ctx.metadata.service_name}}"
}
}
],
"minimum_should_match": 1
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"system.service.state": "inactive"
}
}
],
"minimum_should_match": 1
}
},
{
"range": {
"@timestamp": {
"from": "now-{{ctx.metadata.window_period}}",
"to": "now"
}
}
}
]
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 0
}
}
},
"actions": {
"log": {
"logging": {
"text": "{{ctx.payload._source.host.hostname}}: Service {{ctx.metadata.service_name}} is inactive"
}
}
},
"metadata": {
"window_period": "300s",
"service_name": "sshd.service"
}
}
Expected logging output should look as follow:
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server02.local: Service sshd.service is inactive.
10:00:00 web-server03.local: Service sshd.service is inactive.
10:05:00 web-server01.local: Service sshd.service is inactive.
10:05:00 web-server02.local: Service sshd.service is inactive.
10:05:00 web-server03.local: Service sshd.service is inactive.
My current logging looks as follow:
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
....
When throttle period is enabled: "throttle_period": "5m", it just sends one alert even if the sshd.service inactivity was detected on multiple hosts, for example:
10:00:00 web-server01.local: Service sshd.service is inactive.
Thanks in advance