Group Watcher results per host

Hi,

I'm trying to create a watcher that can monitor service sshd.service status on multiple hosts and create aggregated log entry per host every five minutes. instead of creating log entry every time it detects "system.service.state" = "inactive".
The service status metrics are shipped to elastic by metricbeat agent.

{
  "trigger": {
    "schedule": {
      "interval": "300s"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "metrics-*"
        ],
        "rest_total_hits_as_int": true,
        "body": {
          "query": {
            "bool": {
              "filter": [
                {
                  "bool": {
                    "should": [
                      {
                        "match_phrase": {
                          "system.service.name": "{{ctx.metadata.service_name}}"
                        }
                      }
                    ],
                    "minimum_should_match": 1
                  }
                },
                {
                  "bool": {
                    "should": [
                      {
                        "match_phrase": {
                          "system.service.state": "inactive"
                        }
                      }
                    ],
                    "minimum_should_match": 1
                  }
                },
                {
                  "range": {
                    "@timestamp": {
                      "from": "now-{{ctx.metadata.window_period}}",
                      "to": "now"
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 0
      }
    }
  },
  "actions": {
    "log": {
      "logging": {
        "text": "{{ctx.payload._source.host.hostname}}: Service {{ctx.metadata.service_name}} is inactive"
      }
    }
  },
  "metadata": {
    "window_period": "300s",
    "service_name": "sshd.service"
  }
}

Expected logging output should look as follow:
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server02.local: Service sshd.service is inactive.
10:00:00 web-server03.local: Service sshd.service is inactive.
10:05:00 web-server01.local: Service sshd.service is inactive.
10:05:00 web-server02.local: Service sshd.service is inactive.
10:05:00 web-server03.local: Service sshd.service is inactive.

My current logging looks as follow:
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
10:00:00 web-server01.local: Service sshd.service is inactive.
....

When throttle period is enabled: "throttle_period": "5m", it just sends one alert even if the sshd.service inactivity was detected on multiple hosts, for example:
10:00:00 web-server01.local: Service sshd.service is inactive.

Thanks in advance

You need to iterate through the array of ctx.payload.hits.hits. This is on top of my head and worth a try, but I haven't tested it myself

{{#ctx.payload.hits.hits}}{{_source.host.hostname}}:{{/ctx.payload.hits.hits}}

hope this helps. There are a couple of examples at examples/Alerting at master · elastic/examples · GitHub where you can take a further look.

Thanks @spinscale it worked

1 Like