For a specific use case/dashboard, not all data comes from the same data source. I have a use case where there are at least 4 different data sources. The data can come from logs-* where it must be filtered by fields in order to get the data I care about, a custom log data stream, windows perfmon data stream that needs to be filter down by a field, and metrics-*
I normally used an index threshold rule to notify me if an index reaches 0 documents. But now with this specific use case, it's hard for me to ensure that all that data is always coming in.
I've had some issues in the past while trying to use the built-in threshold alert, it missed the alerts a lot of time, generated some false positive about the data recovering when it didn't recover.
Afte some tickets with support with no solution and they not being able to replicated we gave up and looked for other ways to alert on the data.
What solved our problem was using ES|QL security rules that will trigger based on the difference from the time when the rule is executed and the last event on the data stream, based on event.ingested, we can even alert by separated datasets and filter the data if we want.
It would be something like this?
FROM logs-data_stream.*
| STATS last_timestamp = MAX(event.ingested) by data_stream.dataset
| EVAL lag = DATE_DIFF("minute", last_timestamp, NOW())
| WHERE lag >= 10
| LIMIT 25
This rule would trigger if one of the datasets from the data streams that matches has a difference bigger than 10 minutes from the time when the rule and the most recent ingest document, the LIMIT should be the number of datasets on the data streams, we have one rule per data stream and not a generic rule on logs-* as different data streams have different volumetry.
Some considerations, the look back time in the rule schedule needs to be bigger than the value used in the lag, I normally run those rules very 15 minutes with a look back time of 60 minutes, so the rule may trigger 4 or 5 times, after that you get no more alerts.
One drawback is that you do not get a recover alert, but in my case each time this rule trigger it creates an incident on an external tool so we track the actions there.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.