Low log rate per agent.hostname

Hello everybody,

I have created a machine learning job to be able to detect when beats stop sending logs to my cluster.

Job Type: Multi-metric
Metric: Low count(Event rate)
split field: agent.hostname

an example of a result after one month running looks like that:

Could you tell me please why in the case where the actual rate is 0, which means that the beat completly stop sending logs, the severity is just 17.
I configured the job to send alerts when the severity is greater than 50, and as it's 17 I didn't receive any alert :frowning:

and Is there a way to receive alerts just when the log rate is 0 ?

Any suggestions please !

Thanks for your help :slight_smile:

I want to give a bit of background first, skip to the end to see an answer for your specific question.

Anomaly detection has no conception that zero is an important value. When it scores this event it tries to predict the chance of seeing a zero event rate for the time bucket (looks like you use 1hr). It does this by predicting a distribution for the event rate at that time based on multiple factors, for example seasonal patterns in event rate, random variability in event rate and so on. It also considers the number of preceding buckets the event rate is unusual and how this compares to the history for that partition. In this case, perhaps the event rate for this partition often drops to a low value or even to zero for short periods or perhaps it has a lot of variability which means we can't be confident in our predictions and so that the value won't drop to zero.

When it comes to setting an alert threshold you are making a pure trade off between false positives (FPs) and false negatives (FNs). In this case, you get a FN at an alert level of 50. It may be that you can reduce the alert level and improve the usefulness of the job if you don't get too many FPs. You would have to ask yourself if the things that are being scored more highly are useful to you. In this context it is worth knowing that when we assign scores we use a ranking measure, this means that we guaranty that no more than a certain fraction of buckets will be scored more highly than a given score. This means you can always directly control alert rates using our score.

Is there a way to receive alerts just when the log rate is 0 ?

Yes there are a couple of ways to achieve this.

For a job we have custom rules. This allows you to skip results (i.e. don't generate anomalies) if certain conditions aren't satisfied. In this case, you can skip all results where the bucket actual is greater than zero. You would then alert at a low score, maybe greater than 10 or even lower, but you can experiment to get the best FP and FN rates based on historical data. This way you will only ever see buckets where the rate is zero, but if a partition is often zero it won't alert you all the time. See this blog for discussion of using custom rules with AD. Beware though this may be throwing out useful information when the event rate drops, but doesn't drop to zero (more on this below).

If you know the total event count should only ever be non zero over an extended period of time (say 2hrs) for all partitions why not just use a watch or an alert on this condition? Behind the scenes this is just running a count aggregation periodically checking the value isn't zero.

Finally, there is nothing to stop using a hybrid approach. Maybe some drops off to non-zero values are useful to know about and indicate real performance problems. If this is the case, you could couple your existing job (with a high alert threshold and no rules) with another job looking for zero counts with a rule and using a low alert threshold or a watch or alert based on zero counts.

2 Likes

Thanks for all this explanation @Tom_Veasey :blush: ,

I will try to do as you suggested,
and to answer your question about why I didn't do that with an alert based on zero counts, is just cause I didn't know how to do so, as I didn't find count metric in the alerting.
I thought also about creating a threshold alert to detects when I don't receive logs from one host, but the problem in this kind of job, is that there is only greater than condition and there is no less than

Hi @TheHunter1,

Have you tried Index threshold alert? It allows the "Is below" condition.

Hi @darnautov ,

Sorry to ask this stupid question, but where should I find the index threshold alert :sweat_smile:
For me I have this rules :

I should have provided more context. :slight_smile: The screenshot you shared is about Security Detection rules, I meant something different.

Index threshold alert is one of the alert types provided by Kibana's Alerting and Actions. It allows configuring alerting rules with ease for various apps within Kibana. You can find "Alerts and Connectors" under the Stack Management page in Kibana. For your particular case index threshold seems like a reasonable approach, please share if it works out or not.

1 Like

Hi @darnautov ,
Thanks for your explanation, you just taught me new alerting that I ignored the existence,
I tested the index threshold alert, The only problem is when I configure it with is above I receive alerting that packetbeat is up, but when I configure is to is below and then I stop packetbeat, I don't receive notifications that packetbeat is down

Here is my configuration:

As you can see, I stopped packetbeat in vps765374 as the count = 0

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.