i am planning to subscribe x-pack platinum license for a cluster which I am currently managing. i will use anomaly detection feature, and have a couple questions about minimum requirement of source data.
is there minimum source data time period for pattern analysis / forecast?
i tested myself with only one week sample data. it looked okay for me but still need some advices. if there are any guidance about minimum source data requirement for better / more accurate result, please advise.
It depends on the data set, the volume (events per unit time) and the overall periodicity of the data set. In general, as little as a day’s worth of data can be enough to get a sufficient baseline, but this will get better when more data is seen. Often, a “sweet spot” is 3+ weeks worth of data if the data has time-of-day and day-of-week cycles. The cool thing, however, is that historical data can be batch-executed through as fast as possible so that, even if you were to start analyzing the data today, you could start the analysis on data from a month ago. Within a couple of minutes or hours, that historical data would be analyzed and now you’re running in real-time as if you started ML a month ago!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.