How can Watchers be tested?

alerting

(Ryan Grannell) #1

Hi,

We use a large number of watchers to monitor the health of our VMs and services (RAM usage, failed services, etc.). Unfortunately, it's extremely prone to false-negatives; we often have problems that are ongoing but aren't detected by email-alerts, which from our perspective looks like there's no problem.

I want to test our email-alerting with data that will definitively trigger an email-alert to make our watchers more reliable.

For example, the following watcher is triggered if less than 90 heartbeat-messages are seen in heartbeat-* within the last ten minutes. I'd like to test this watcher over REST with a manually-provided search_match result (an empty array), and then check that the watcher was triggered from the response body.

If I can do this, I can build a test-suite for our watchers easily.

I think the Execute API supports this, but I don't really understand exactly what the parameter documentation for trigger-data or alternative_input meant, or how these parameters differ.

  • Can trigger-data or alternative_input be used to provide fake input data to a watcher? What would this data be in the case of providing empty search results?
  • How do the trigger_data and alternative_input parameters differ?
  • Is there a 'canonical' way of testing watchers that I missed?

Any help would be appreciated

Example Watcher

{
  "trigger": {
    "schedule": {
      "interval": "15m"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "heartbeat-*"
        ],
        "types": [],
        "body": {
          "query": {
            "bool": {
              "must": [
                {
                  "match": {
                    "host": "vm_0"
                  }
                },
                {
                  "match": {
                    "tags": "heartbeat"
                  }
                },
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-10m"
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "lt": 90
      }
    }
  },
  "actions": {
    "email_administrator": {
      "throttle_period": "15m",
      "email": {
        "profile": "standard",
        "attachments": {
          "attached_data": {
            "data": {
              "format": "json"
            }
          }
        },
        "priority": "low",
        "to": [
          "foo@example.com"
        ],
        "subject": "Watcher - too heartbeat messages",
        "body": {}
      }
    }
  }
}

(Alexander Reelsen) #2

Hey Ryan,

you are right, that the execute watch API is what you are searching for. Let me try to explain what those parameters do (if it worked) and of course, how we can improve the documentation.

First, alternative_input is the parameter you searching for to provide custom input. Let me demo this with a simple watch (using 5.0 here, but the principle is the same).

PUT _xpack/watcher/watch/test
{
  "trigger" : { "schedule" : { "interval" : "1h" }},
  "input" : {
    "simple" : { "foo" : "bar" }
  },
  "actions" : {
    "log_error" : {
      "logging" : {
        "text" : "Got payload {{ctx.payload}}"
      }
    }
  }
}

PUT _xpack/watcher/watch/test/_execute

If you execute the above watch, you will see a log message like

[2016-11-09T10:50:11,628][INFO ][o.e.x.w.a.l.ExecutableLoggingAction] [EXYBbWD] Got payload {foo=bar}

However, if you use the alternative_input variable

PUT _xpack/watcher/watch/test/_execute
{
  "alternative_input" : {
    "spam" : "eggs"
  }
}

The log statement changes to

[2016-11-09T10:50:34,808][INFO ][o.e.x.w.a.l.ExecutableLoggingAction] [EXYBbWD] Got payload {spam=eggs}

Now, on to trigger_data

PUT _xpack/watcher/watch/test/_execute
{
  "alternative_input" : {
    "spam" : "eggs"
  },
  "trigger_data" : {
    "scheduled_time" : "2001-11-09T09:50:34.807Z",
    "triggered_time": "2010-11-09T09:50:34.807Z"
    
  }
}

This changes the scheduled and execution time of your trigger (as you can see in the output of that call). If you have a search query that uses a time filter to only return the last five minutes, this allows you to reconstruct a past query.

Hope that as helpful! Otherwise feel free to ask further questions!

--Alex


(Ryan Grannell) #3

Hi Alexander,

Thanks for explaining both parameters so clearly; your example shows that alternative_input essentially sets ctx.payload, which is useful to know. I imagine something like

{
	"alternative_input" : {
		"hits": {
			"total": 0
		}
	}
}

Will trigger the example I gave above.

This isn't ideal for testing though; I'm really trying to test whether the query / aggregation section of my Watcher is returning the correct result.

Is there anything like the alternative_input section that lets my provide a fake document-set to the watcher, rather than just setting ctx.payload directly?


(Alexander Reelsen) #4

Hey,

I think what you may want to try instead then is the execute watch API, but with simulated action modes like this

POST _watcher/watch/my-watch/_execute
{
  "action_modes" : {
    "_all" : "simulate"
  }
}

Which executes your configured search, but will not execute the actions.. yet return useful info in the JSON response.

--Alex


(Ryan Grannell) #5

Hi,

Yes, that makes more sense. I'm still looking for a way to provide a "pre-queried" set of logs to the watcher though (or a manually-uploaded list of documents), rather than pointing it at an active index.

It seems that Watcher doesn't directly support this, so the only way I can think of getting this to work is:

  • Manually query a test-set of logs using the query / aggs block of the watcher
  • Provide this result-set to the watcher using alternative_input, with simulate mode enabled
  • Check the expected action was triggered

This will work, it's just a little awkward.

Thanks for your help


(Alexander Reelsen) #6

Hey,

a workaround for this issue could be, that you use an alias in your watch to query, and that points to your test data or your live data, but using the alternative_input with some scripting on the client side might be easier, to be honest.

--Alex


(Ryan Grannell) #7

Thanks a lot Alex, I appreciate your help. I'll try the second approach, which shouldn't really be too difficult to implement anyway.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.