Kibana rule - raise alert when CPU is over 90% for the last 5 min

catalin.bulancea · October 4, 2023, 2:58pm

Hi gurus,

I'm new to Rules in Kibana so I need your help.

I need to raise an email alert when the CPU is constantly exceeding 90% for the past 5 minutes.

The way I configured the rule is the following:

The alert is being triggered even if there was only a short few seconds spike over 90% for the CPU in the past 5 minutes. Is this the normal behavior?

What I want is to trigger the alarm only if the CPU is staying over 90% for at least 5 minutes.

How can I write such a rule?

Thank you,
Catalin

catalin.bulancea · October 9, 2023, 1:27pm

Any replies guys?

stephenb · October 9, 2023, 5:30pm

Hi @catalin.bulancea

Hmm interesting...

Is that field absolute or percent?

Perhaps try Average...

What version are you on?

What exact rule are you using?

Are you using Group By?

catalin.bulancea · October 12, 2023, 12:02pm

Hi Stephen,

The field is absolute, i.e. 0.1 is 10%, 0.9 is 90%.
I am on version 7.17.4.
The rule I am using is:

So yes, I am using Group by, because there are multiple fields.subsystem that match the fields.system.

What if I'd use Min instead of Max?
That would mean: if the minimum of the CPU usage is above 90% for the last 5 mins, then the the Max and Average will be above 90% too and it will stay there for the entire 5 min duration.
Is my understanding correct?

Thanks,
Catalin

stephenb · October 12, 2023, 3:12pm

Hi @catalin.bulancea

Here is the way I think of it / understand it.

Say your host.cpu.usage is collected every 10s from your hosts.

And you have FOR THE LAST 5 Minutes as your criteria, so ~30 samples that are looked at for each 5 MIN Interval

For MAX : If 1 of those samples is above the Threshold and the other 29 are below. The Max over the 5 Min time frame IS met. (You only need 1 for the condition to be met.)

For MIN : If 1 of those samples is below the Threshold and the other 29 are are above The Min over the 5 Min time frame IS NOT met. (You only need 1 for the condition to not be met) ....

So yes, your assumption is correct but it is not the recommended approach because all it takes is 1 sample below the Threshold not to meet the criteria.

This is why the vast majority of users use AVG for the case you are describing.

If you are concerned perhaps change the window to the last 1 MIN

catalin.bulancea · October 16, 2023, 4:45pm

Hi Stephen,

Thank you for your explanations! It makes more sense now. So MIN is not the way to go. I will try the AVG and see how it goes.
Could you give me an example how AVG will behave with the 30 samples in the 5 min interval?

Thank you,
Catalin

stephenb · October 16, 2023, 5:02pm

For AVERAGE : It will calculate the average (arithmetic mean), so sum all the CPU percentages over the 5 minutes / 30 buckets in the 5 mins, simple direct average calculation Sum the Values / Count of Values.

system · November 13, 2023, 5:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CPU usage alert reason is incorrect Metrics	5	566	November 9, 2021
Elastic Alert Kibana elastic-stack-alerting	8	376	January 21, 2023
Doubts about Kibana Rules and conections alerts Kibana elastic-stack-alerting	2	216	June 23, 2023
Advanced Watcher to send alert of condition has been met for more than 1 hour Kibana	10	147	January 11, 2024
Simple CPU alert Kibana	7	4529	January 17, 2019

Kibana rule - raise alert when CPU is over 90% for the last 5 min

Related topics