I recently started to play around with x-pack ML for anomaly detection.
In first seen it seems when we specify a metric function as well as bucket-span, the statical model for anomaly detection should be built based on metric function results regardless of how many events inside each bucket.
But recently, I observed something which implied the number of events which is feed to the Anomaly detector is also important and if they are not enough the model would not be built. then my clear question is this:
For building the Model which is used for probability calculation, only metric function results get used or events also have their own impression?
Also, how many of events should be provided till Anomaly detector could build the model and start anomaly detection? Which is important on a live data input where data comes gradually.
Thanks in advance
Correct - the modeling takes into account the amount of data being seen by the job. There is a bare minimum of at least 2 hours of data (or 4 bucket_spans, whichever is longer) - and the number of observations, in general, affects the probabilities calculated. The more the observations, the "better" the confidence of the predictions made by the model.
The most "ideal" amount of data for maturing the model is usually right around 3 weeks - especially if the data has daily and weekly periodicity components (i.e. different patterns on weekends than on weekdays). However, if your data doesn't have this kind of behaviors, then you can get away with less data to get a really nice model.
To help bootstrap the model, the datafeed allows you to replay historical data that you many have in your elasticsearch index. After analyzing this historical data, you can continue the job in "real-time".
Also, modeling is done for all functions - including metric functions (min/min/sum/etc.) and even count functions (count, low_count, distinct_count, etc.)
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.