I'm testing some ML Jobs with daily bucket span over my Elasticsearch data for different datafeed periods and I've noticed different anomaly points as I changed the datafeed start time. For example, the job below is configured to query all data (Start at beginning of data option) until now (No end time option):
Probably the first one wasn't an anomaly because of the past metric value (since datafeed is querying all data), but it should have been an anomaly compared to the last 10 days.
Is there a way to configure the start/end time of my ML Job datafeed in order to consider only "Last x days" in analysis (something like a "rolling" period: now-10d)?
Yes, you will obviously get different results depending on when you start your analysis and the characteristics of the data during that period of time.
ML is designed to be constantly evaluating your data, understanding and modeling its behaviors over all of the time it has seen your data. The main benefit is near-real-time alerting without having to set a static threshold alert.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.