Anomaly detection: minimum amount base data requirement?

machine-learning

(Jason J) #1

hello,

i am planning to subscribe x-pack platinum license for a cluster which I am currently managing. i will use anomaly detection feature, and have a couple questions about minimum requirement of source data.

is there minimum source data time period for pattern analysis / forecast?

i tested myself with only one week sample data. it looked okay for me but still need some advices. if there are any guidance about minimum source data requirement for better / more accurate result, please advise.

thank you!


(rich collier) #2

It depends on the data set, the volume (events per unit time) and the overall periodicity of the data set. In general, as little as a day’s worth of data can be enough to get a sufficient baseline, but this will get better when more data is seen. Often, a “sweet spot” is 3+ weeks worth of data if the data has time-of-day and day-of-week cycles. The cool thing, however, is that historical data can be batch-executed through as fast as possible so that, even if you were to start analyzing the data today, you could start the analysis on data from a month ago. Within a couple of minutes or hours, that historical data would be analyzed and now you’re running in real-time as if you started ML a month ago!


(Jason J) #3

Thank you for your explainations in detail.

My data volume is about a couple tera bytes per day, and weekdays and weekend show different trends.

I assume 1month data is okay for anomaly detection and forecase purposes; last 1week data analyze based on older 3weeks data.

Thank you!


(system) closed #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.