I have unique data coming in once a day for a field and set up an advanced ML job with a bucket span of one day accordingly. The configured detector is a summation of a unique number field partitioned by another field.
Alerts are also set up so any anomalies above a threshold is emailed to me but I would also like to know when data comes in late (compared to its historic timing).
How could I configure my ML to satisfy my needs?
Okay, thank you. That is very helpful.
A follow up question I have about this is, would 1 ML job with multiple detectors be more efficient?
In general, yes. Because in each bucket_span, the data only needs to be queried once, then applied to both detectors. However, the viewing/interpreting of the results is easier in our UI (I find) if a job only has one detector.
Hi richcollier, I have set up the ML according to the time_of_day function with a bucket span of 1 day. However, it is not exactly how I would prefer it to behave.
In your experience, is it possible to get real time alerts for late data with unique data coming in once a day?
Note in : Appendix P: Time functions | Machine Learning in the Elastic Stack [8.5] | Elastic
- Shorter bucket spans (for example, 10 minutes) are recommended when performing a
time_of_week analysis. The time of the events being modeled are not affected by the bucket span, but a shorter bucket span enables quicker alerting on unusual events.
The separate ML with the time of day detector worked well for late data congestion!
My current configuration for the first ML is a summation of a number field by a field that is consumed once per day. Therefore, I set a bucket span of 1 day. However, when setting up my alerts, I will only get 1 alert at the end of day (after the ML has run).
Are there any configurations for the ML to run real time for the alerts to be real time as well (with the constraint of how my data is coming in)?
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.