Log threshold alerting rule to check the presence of logs on specific hosts

Hi,

I am trying to set up an alert rule that will alert me when a job that I expect to run at particular servers stops writing into the syslog. The idea is that I want to receive alerts when:

  • Any host with a field server_type: "gitlab-runner"
  • Does not have a log message containing the phrase Total reclaimed space
  • In the last 8 hours

I set up a Log threshold rule with the configuration

WHEN THE count OF LOG ENTRIES
WITH message MATCHES PHRASE "Total reclaimed space"
AND server_type IS gitlab-runner

IS less than 1
FOR THE LAST 8 hours
GROUP BY host.name

(check every 2 hours)

The problem with this alert rule is that it considers all the incoming hosts. Most of the hosts do not have that Total reclaimed space in their logs (and they are not expected to), so they send alerts. That is not what I want. I want something to tell Kibana to "before doing anything else, ignore hosts not annotated with server_type: gitlab-runner". How can I do that?

Thank you

What version are you using?

Hi @melkamar

What is the mapping type of server_type should be keyword

So a couple of things....

Doing Less Than and grouping can have some performance implications that is a longer discussion.

Try this I think then you will only get from the hosts that have server_type

GROUP BY server_type, host.name

Hi, I am using 8.7.1.

server_type is a keyword.

I tried doing the GROUP BY you suggest, but the problem remains. The problem is that, when you say

you will only get from the hosts that have server_type

All my hosts have server_type defined. So this will still match everything. What I want is to only match hosts that have a particular value of server_type.


I saw the warning about not using Less than in a grouping query, but I could not think about an alternative. I am very much open to other approaches to this. All I need is an alerting rule that will check that a particular string appears in the logs on a regular basis.

Yes I under stand

So here is my test and it works and only limits to the the kubernetes.labels.app IS productcatalogservice

BUT this is a more than case

LOG VIEW Default
WHEN THE count OF LOG ENTRIES
WITH message MATCHES via_upstream
AND kubernetes.labels.app IS productcatalogservice

Add condition

IS more than 2000
FOR THE LAST 5 minutes
GROUP BY kubernetes.labels.app, host.name

BUT when I tried it with a less than case I got all the kubernetes.labels.app so you are right I think that an artifact of the "Less Than" ... and more I look at it I get why ...

LOG VIEW Default WHEN THE count OF LOG ENTRIES
WITH message MATCHES via_upstream
AND kubernetes.labels.app IS productcatalogservice

Add condition
IS less than 10000
FOR THE LAST 5 minutes
GROUP BY kubernetes.labels.app, host.name

Because in fact the condition IS actually met for every pod and host because there are 0 entries for all the conditions/combination so that is why all are being reported...
with the More Than those conditions are not Met.

So all that ... hmmm yup .... I need to think about that

so in the End if you are really just trying to figure out when the last time a particular service wrote a log...

I would use perhaps a latest transform and then a simple alert on top of that

Latest transform are pretty awesome way of keeping track of the "Last Event" from logs, services, hosts etc.... give a look.

I have used this more many use cases.... it works really well!

There also may be another way to do this with a DSL Query.

In your case, the filter clause would be:

`filter:

  • host.name:
    type: equals
    value: gitlab-runner`

This filter clause tells Kibana to only consider hosts that have the server_type field set to gitlab-runner.

So, the complete alert rule would be:

`WHEN THE count OF LOG ENTRIES
WITH message MATCHES PHRASE "Total reclaimed space"
AND filter:
- host.name:
type: equals
value: gitlab-runner

IS less than 1
FOR THE LAST 8 hours
GROUP BY host.name

(check every 2 hours)`

Hi @Andrew_Mora Welcome to the community and thanks for the help.

Can you show a screenshot where you see a filter on the Log Threshold Rules Screen?

Perhaps I am missing it somewhere ...
On other rules there is a KQL filter but Log Threshold does not have it

Can you show us on this screen where you see the filter option is?

This is version 8.9.1 so pretty up to date what version do you see filter on?

Yeah I ran into the same issue and also conclusion - it is technically correct that all the hosts are matched, because the condition applies to all :grin:

Huge thank you for pointing me in the direction of the Transform feature though. I have not used it before and it's exactly what I need! That solves my initial question.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.