Alerts based on increase in logged value, rather than specific threshold

Hi there, I am quite new to the Elastic Stack, as is my company. We are attempting to build a centralized logging and alerting system, building dashboards and alerts in Kibana. Currently we have AWS Lambda functions shipping CloudWatch logs to a LogStash server, which parses our data and pushing it on to our Elasticsearch server.

Starting out simple, we have a lambda function that queries a queue and checks for events that need to be processed, and logs the result. Here's an example of one of those logs:

[INFO] 2021-01-14T17:59:22.420Z b7e95598-3c21-42bd-bbe1-cc76ddae431d $Count$: 0

Logstash then parses a couple fields, most importantly, the count into a number. I've created an OK graph of this value. I would prefer to show the exact value, but it needs an aggregation, so I instead have the sum over a period of time, which works. Anyway, not the issue here.

We want to create alerts. It is obvious and easy to setup alerts for something like, when count > 20: alert, but what we would really like to setup is more like, when count > 20 && count is growing: alert. So, for example, if we could check every half an hour, we would alert the first time the count hits 21, and then we would alert again next half hour if count had gone up to 40, or something.

The logic behind this being, that way alerts will continue until the issue is fixed. But at the same time, if the issue is fixed when count is at, say, 800, we will not get an alert next half hour when it is still over 20, but is reducing to, say to 300. Previously we had a simple threshold of 20, as described above, but we ran into an issue where, when we thought an issue was fixed, it turned out it wasn't, but we didn't get anymore alerts.

It would also be phenomenal if we could setup some sort of logic for "if this is the 7th alert, change who is emailed," but that's not essential.

If anyone can point me in the right direction of how this may be possible, it would be vastly appreciated!

Currently, alerts use absolute levels as predicate thresholds, eg. larger than a value, or detecting missing data in time bins, eg. a monitoring feed stopped. Would you be kind and file your suggestions in a Kibana github issue with pretty much this in your description and put the issue link here?

One current option would be, setting up an index, or field, that'd have the per time bin difference. In this case, countDifference > 0 would be equivalent to the count is growing predicate.

I'll also check with the team if there are other practical approaches and get back here soon if so.

We had a discussion and it'd be great if you could file this as an enhancement request (link above), though it won't help you over the short term.

If you do have a way of expressing these delta values using an ES aggregation (which depends on how your data is structured) then the generic ES alert type could do that, though this may be hard to implement.

It would also be phenomenal if we could setup some sort of logic for "if this is the 7th alert, change who is emailed," but that's not essential.

Having different action groups would give you some tiering, though only within the confines of what's possible to do currently. You could attach one Email Action to the warn group and a different one to critical group.

Also, we have a related issue that isn't advancing rapidly, still, might be useful to subscribe to it for status changes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.