How a elastic machine learning(anomaly detection) job depends on historical data?


Is it possible to redirect one machine learning (anomaly detection) job to a new data stream having the same sets of fields of the old historical index when it is live?


We have 10 ML jobs (anomaly detection) currently running in production. We took the last 1 year of data for building the models and then make the jobs live for anomaly detection (Bucket span 4h) in real-time. Now the issue is that the index is becoming too big (50gb+) and we thought to close the index and create a data stream instead and enable ILM to it.

Now, can we redirect the datafeed to the new data stream without breaking the live job?

Will it affect the model?

Please let me know so that we can make the necessary changes to handle this large index.


Yes, you can do this. What you need to do is:

  1. Stop the datafeed: Stop datafeeds API | Elasticsearch Guide [8.6] | Elastic
  2. While the datafeed is stopped, update it and change its indices setting: Update datafeeds API | Elasticsearch Guide [8.6] | Elastic
  3. Start the datafeed again: Start datafeeds API | Elasticsearch Guide [8.6] | Elastic

When you restart the datafeed you can specify start=0 and it will pick up from the time it was stopped.

That depends on how different the data in the new data stream is. If the new data stream contains identical data to the old index but just in a data stream then there should be no impact whatsoever on the model. If the new data stream turns out to contain different data (either deliberately or due to a mistake) then the model could change significantly as it learns from the different data.

Thanks for your response. :slight_smile:
Let me try the approach you stated.