I have defined a machine learning with a high_mean detector using both by_field_name and the partition_field. But in last 3 weeks one of the dimention is started to behave differently and as expected machine learning job created anomalies for the mean value is going high.
We have restricted skip_model_update with actual > 10, since for me if the real time mean value is above 10 model should not be updated.
But after 1 week or so model has triggerd the model change by saying "Detected model shift by X value" and typical value has been changed to around 50 (Actual going rate for the mentioned specific dimention). Just needed to understand why ml has did model change when I say skip update if the actual is greater than 10?
Do I need to add additional rules to configure skip_model_update. etc: typical
Most puzzling for me was few days back ml has removed all boundaries by saying "removed all seasonality". Need to understand why ml has done in the first place and what happen when the dimention goes to acceptable level. Will it enable the model bounds again?
So we had problems in the past where we completely discarded updates for an extended period of time. This is particularly problematic if the model is initialised with a rule in place and is never able to learn anything. As a result we changed skip updates to soft skip updates; this is how we handle anomalies at present, by significantly reducing the attention the model pays to them. This means that if a rule condition, such as actual > 10, persists for long enough (many days) we can update the model with them. It is arguable that we should make more effort not to update the model, for example disabling change detection in such cases as well. This is something we'll discuss internally.
We repeatedly run testing to see if seasonal patterns have changed. As part of this process we can remove seasonal modelling if we think that it is no longer helpful making predictions; essentially we test prediction accuracy with and without the components we're currently using. If the data obviously retains seasonal patterns, i.e. it looks like it was wrong to change the model in this respect, it is possible that there is some interaction between the statistical tests we use and soft skipping values. This is an area we might be missing some test coverage.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.