Do not alert for no data for decommissioned server

My cluster fires a kibana alert when there is no metricbeat data for a server for the last 15 minutes, which is very useful in the case where there is an issue with the beat/server.

However, when a server is decommissioned, the alerts continue to fire unnecessarily.

I have added queries to the kibana alert to exclude the host.name of the decommissioned server(s), but in a large estate, this is not practical and I think it is taxing on Kibana alerting when saving the updated rule.

Is there any way to manage this better, so that I don’t receive alerts for decommissioned servers. I do not want the alerts generated at all (no email, not indexed and not logged)

Thanks in advance.

Hi @Kodito Welcome to the community.

What version are you on?

If you're using kibana alerts, you can create the alert so it only notifies you once in status change. So you only get the alert the first time?

Perhaps I'm missing something

Hi Stephen

Thanks for your reply.

I should clarify that I do only get 1 alert (on status change) however the alert index continues to be populated with the alert being active. Therefore it appears in the active alerts dashboard I have configured, which is monitored.

There is no apparent way to distinguish being a genuine issue where a server should be reporting in vs one that is decommissioned and expected not to report in.

The current work around I have is to give a tag to the server eg “decommissioned”, before it is shutdown, so that when kibana alerts, I can filter out by the tag “decommissioned”, but I wonder is there a better way.

Thanks

Hi @Kodito

What version are you on? That is important.

Can you share your alert configuration please?

And which Alert Index are you talking about ... the system alerts index or an index you are writing to as part of the alert action?

I would not expect that the action with on status change would continue to write to the connector index... but I may be wrong. I would need to test...

If you are using /looking at the system alerts index, that is not the best practice and could explain the behavior you are seeing...

Can you show what dashboard you are referring to?

BTW technically, the Alert IS still active... because it Fired... and is waiting to recover...

1 Like

Hi Stephen

I’m on version 8.11.1

Agreed - the alert is technically still active, which makes total sense.

The alert is a kibana “rule”, checking every minute if the condition is below 1 is met for the metricbeat-* indices/data streams in the last hour, grouped by host.name. “Alert me if a group stops reporting in data” is checked - which we want for general server health issues.

The rule checks every 3 minutes and alerts on status change.

The rule sends an email and uses the alert index connector (again on status change. I do not have the “if alert matches a query or “if alert is generated during time frame” options checked for the connector.

The dashboard is a custom one built with visuals using the index pattern “.alerts* - I think you’ve reminded me that this is going to be a problem with regard to the use of system indexes - perhaps this is what I should focus on first?!

Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.