Create a simple frequency based alert using Watcher

rouble · August 8, 2018, 7:56pm

Gurus,

I am looking to create a frequency based watch. Very simply, if the query returns any results at all in the last N seconds, send an alert. I have written something that works, but since I am new to writing watches, I feel like what I have may be overcomplicated. In this example, I have N as 30 seconds, so I have the schedule interval set to 30s, and then I have a range filter on @timestamp: "gte": "now-30s".

My watch is pasted below. Two questions, is there a more efficient way to do this? Secondly, is it possible for me to miss query matches.

{
  "trigger": {
    "schedule": {
      "interval": "30s"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "logstash-*"
        ],
        "types": [],
        "body": {
          "size": 0,
          "query": {
            "range": {
              "@timestamp": {
                "gte": "now-30s"
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gte": 1
      }
    }
  },
  "actions": {
    "my-logging-action": {
      "logging": {
        "level": "info",
        "text": "There are {{ctx.payload.hits.total}} documents in your index. Threshold is 1."
      }
    }
  }
}

spinscale · August 9, 2018, 6:48am

Hey,

the watch itself looks fine. The discussion around missing query matches however is a completely different one. I have two things to add to that

Elasticsearch has something called a refresh interval, which defines how often data is made available for search (by default every second). This means, that your data might only be available from the last 29 seconds, but when doing the next query it is no more in that time window
The above however is only a minor problem. The IMO bigger issue is the fact, that the timestamp is usually the timestamp when the event has been created within the application. What is not taken account for, is the fact that this event needs to travel to Elasticsearch. Maybe you are sending data directly from beats to ES, but maybe you are sending it to a broker first, where it sits a few seconds and then it gets indexed. Also your ingestion could have a bigger delay due to a DDoS attack or network outage. This will add a bigger delay than the one second refresh above.

The question then is, are you good with ignoring those things, or do you want to query bigger time windows with the likelihood of duplicating alerts or add some more fancy mechanism for alerting.

Hope this helps!

--Alex

rouble · August 15, 2018, 6:08pm

Thank you. That helps. To summarize what you said, if the watch runs every 30 seconds, but queries data for 30+N seconds worth of data, we should not lose any hits, but we may, occasionally, get some duplicates.

This kind of watch is very commonly written in alerting systems, so that we don't inundate the receiver of the alarm or the notification. It is usually called notification interval or alarm interval.

Here is the definition notification_interval from another, now ancient, notification framework called nagios:
notification_interval : This directive is used to define the number of "time units" to wait before re-notifying a contact that this service is still down or unreachable.

spinscale · August 17, 2018, 8:44am

I think you may be interested in the acknowledgement/throttling capabilities of watcher in that context. Please see

https://www.elastic.co/guide/en/elastic-stack-overview/6.3/actions.html#actions-ack-throttle
https://www.elastic.co/guide/en/elastic-stack-overview/6.3/how-watcher-works.html#watch-acknowledgment-throttling

rouble · August 17, 2018, 12:57pm

Nice! throttle_period looks very interesting. Any thoughts if there is a way to better rewrite my watch using throttle_period. Here is essentially what I need to do:

If there are any new hits to my query in the last N seconds do the appropriate action.

We are trying to use watcher to alert via pagerduty when high severity logs come through.

spinscale · August 20, 2018, 7:28am

watches are running stateless so the definition of new is a tricky one. If you need state, you could always store the result count of a query in its own document using the index action, and compare that count at the next run of a watch, when running the same query with the same filters.

Another alternative could be, that if those documents are super rare, you store information, if you already processed these documents, but that is not feasible for higher volumes.

system · September 17, 2018, 7:28am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Watcher / Alerting: Time issues with frequent watch Elasticsearch	3	2078	January 11, 2017
Watcher search for feild value in last 1mins Elasticsearch elastic-stack-alerting	2	1041	July 6, 2017
Watcher Timestamp showing in EPOCH Elasticsearch elastic-stack-alerting	4	908	October 1, 2021
How watcher query the elasticsearch based on time interval Elasticsearch	1	426	July 5, 2017
Setting Varying Non-Relative Watcher Timeranges? Elasticsearch	2	347	May 10, 2019

Create a simple frequency based alert using Watcher

Related topics