Alert re-triggering & limited context (recovery actions)

Hello,

we are currently working with Kibana alerting in a production environment and ran into a couple of challenges. We would like to ask if there are recommended approaches or built-in solutions for the following scenarios.


1) Alert re-triggering when using “Create an alert if matches are found”

We have a large number of alerts where we intentionally use the option:

“Create an alert if matches are found”

because we do not want to generate a separate alert per host due to the scale of our environment.

However, we are facing the following issue:

  • Once the alert is triggered, it goes into active state.

  • If additional hosts start matching the condition during that time, no new alert is triggered.

  • As a result, newly affected hosts are not reflected in notifications.

We have partially worked around this by modifying the queries to detect incremental changes (e.g. comparing counts over time), but this significantly increases the complexity of otherwise simple alert logic.

Our questions are:

  • Is there a built-in way in Kibana to re-trigger or notify on changes within an already active alert (e.g. when new entities match the condition)?

  • Is there a recommended pattern for handling this scenario without implementing complex “delta logic” directly in the query?

  • Alternatively, is this expected behavior and the only supported way is to handle it via query logic or per-entity alerting?


2) Limited context available in recovery actions

We are also facing limitations when handling recovery notifications.

Example:

  • When an alert fires (e.g. host becomes offline), we can include useful context in the notification (e.g. host.id, hostname, etc.).

  • However, when the alert transitions to recovered, the available context variables are very limited.

  • We are not able to reference the original entity (e.g. which host triggered the alert).

In our environment (hundreds of hosts), this creates a significant usability issue:

  • The support team cannot easily identify which host has recovered.

  • The only reference is often the alert ID, which is not practical.

Our questions are:

  • Is there a way to access previous alert context/state (e.g. fields from the triggering event) in recovery actions?

  • Are there recommended approaches to persist or reuse alert context between active and recovered states?

  • Is this a known limitation, or are we missing a configuration or feature that would allow richer recovery notifications?


Thank you in advance for your guidance.
We are trying to keep our alerting both scalable and readable, and these two areas are currently the main blockers.

Thanks!

Hello @Actor01

Welcome to the Community!!

Which is the Kibana Rule Type used for both the cases & ELK version ?

Also are you using 2 actions for every rule ? one when "Query matched" & one where "Recovered "?

For topic 1 :
Can try to use below where each execution will exclude previous records :

image

The part of action can be changed from "On status changes" to "On check intervals"

So every time your rule runs & conditions are met you will receive an Alert only issue if the problem persist for existing hosts & new records are added for same hosts with problem during each run you will receive multiple / duplicate alerts.

For topic 2 :

As per the understanding do you have multiple rules or custom rule to trigger offline alerts for each of the host because in topic 1 you said that there is only 1 rule which will check count of hosts which are offline so 1 rule will check status but it will not recover until all hosts are Online so ideally there will be no confusion but it seems you are using multiple alerts to trigger which hosts are down ? need more information to understand this case.

Thanks!!