An issue we are facing with real-time timestamps is the log processing time. Even if log processing time is less than a second to go through the pipeline, there is still a chance this log is missed due to the nature of the interval queries we run.
Now imagine the watcher querying at 08:56:00 - 08:58:00 - (this event didn’t occur so the alert wont fire).
Next execution is 08:58:00 - 09:00:00 (the event existed but we didn’t get it into platform until 09:00:01 so the watcher won't see it).
Next execution is 09:00:00 - 09:02:00 (the event was processed within these time periods but the query is done vs the @timestamp that is out of these times - 08:59:58).
If we introduce additional lookback time that is greater than the pipeline delay this will solve this problem. But the issue this would then bring would be that the same log could be alerted on twice (or more depending on our intervals).
How could we do deduplication of these alerts and not send them to our api on Elastic Side?
how about writing a timestamp as part of the an index pipeline (which is now when the event is indexed). Then you could filter on that date and would not be reliant on the ingestion/event occurance. Would that help in your case?
Unfortunately not as this would lead to massive amounts of false positives. I've gone with using the "delayed" watcher approach with the delay greater than expected pipeline delay.
And trying to think of a more creative way to solve anything that those watchers might have missed as part of some other delay that isn't due to our pipeline (or is). But this is the tricky bit as querying a whole days worth of logs to check what has not been alerted on will be a pain..
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.