Anomaly detection: minimum amount base data requirement?

hello,

i am planning to subscribe x-pack platinum license for a cluster which I am currently managing. i will use anomaly detection feature, and have a couple questions about minimum requirement of source data.

is there minimum source data time period for pattern analysis / forecast?

i tested myself with only one week sample data. it looked okay for me but still need some advices. if there are any guidance about minimum source data requirement for better / more accurate result, please advise.

thank you!

It depends on the data set, the volume (events per unit time) and the overall periodicity of the data set. In general, as little as a day’s worth of data can be enough to get a sufficient baseline, but this will get better when more data is seen. Often, a “sweet spot” is 3+ weeks worth of data if the data has time-of-day and day-of-week cycles. The cool thing, however, is that historical data can be batch-executed through as fast as possible so that, even if you were to start analyzing the data today, you could start the analysis on data from a month ago. Within a couple of minutes or hours, that historical data would be analyzed and now you’re running in real-time as if you started ML a month ago!

Thank you for your explainations in detail.

My data volume is about a couple tera bytes per day, and weekdays and weekend show different trends.

I assume 1month data is okay for anomaly detection and forecase purposes; last 1week data analyze based on older 3weeks data.

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.