I am trying to create a Threshold rule-based .. if I have 1 or more events with login failure, create an alert, As you can see in the image, in the Rule preview, some alerts were "found", but this rule is not generating any alert...
Is someone can try to explain to me why in the rule preview, the "rule" works, but it is not generating an alert?
The short answer for why rule preview shows alerts while your rule has not produced them is that rule preview does not time travel: it shows you the result of N simulated rule runs over a particular period of time (36 hours in your screenshot) with all your current data. Unless your data is static from the time of rule creation, the actual rule executions will differ from rule preview because all of the data is present for rule preview, but only some of that data was present during the actual rule execution.
Ingestion pipeline delay, or late-arriving events in general, is a big source of false negatives, and why we recommend using e.g. event.ingested over @timestamp. I suspect this is what you're seeing in your environment. When coupled with rule preview, it can appear as though a rule missed an alert.
You can confirm this by increasing the lookback of your rule (which for a threshold rule isn't quite the same thing as running a "shorter" execution in the past, but should be sufficient for validation) far enough that you meet your alert threshold.
A simple timeline of what I'm hypothesizing might help:
At time 0 (T0), a "login failure" event is generated at the data source, and is sent to elasticsearch with an @timestamp of 0 (I'm using integer times for simplicity).
At T11, the rule executes, and looks back over the @timestamp range [T0, T11].
No alert is generated, because no events are found
The ingest processor finishes processing the T0 event above, and it's now available (but this doesn't matter to the rule, because next time it runs it will be looking at the @timestamp range [T10, T21]).
You can see in the above example that running Rule Preview after step 4 would show an alert being generated, because the T0 event is now there, even though the rule never caught it.
You might also observe that there are two solutions to this problem:
Increase you're rule's lookback so that it captures the T0 event
this is an imperfect solution because you're effectively setting the "maximum ingest lag" this way, and you're not guaranteed to catch all late-arriving events
Configure the rule to search based on ingest time (event.ingested) instead of observed time (@timestamp).
This is the more robust solution, since it guarantees that events will not be missed. It does, however, require you modifying your ingest pipeline to add this field (if it does not already).
Hi, I hope you dont mind me tagging on an additional question off the back of this topic. Does the "additional look-back time" dictate the time range threshold for the rule also? i.e, is this rule looking for >=1 login failure over the additional look-back time period ? or is the time period for which to count instances relating to the threshold defined some other way ?
Hey @Kiwisaki , good question. Yes, the rule includes the lookback time as part of its search. The range of data relevant to the threshold rule is defined as:
Hi @RylandHerrick, thanks for replying so quickly.
For the sake of being crystal clear then: if my rule runs every 5 minutes with an additional look-back time of 5 minutes, then the rule will execute every 5 minutes and run the query against the previous 10 minutes worth of data ?, and then repeat the process 5 minutes later - is that correct ?
@luizmeireles the recommendation for your situation is to use event.ingested as the timestamp override field, which is an advanced rule setting (instructions linked). It should not be needed for any other part of the rule configuration (e.g. the "Threshold Count" setting can/should go back to using event.action, if desired).
If your data does not have event.ingested populated, you'll need to add an ingest pipeline yourself; the first example linked here shows how that might look.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.