How a elastic machine learning(anomaly detection) job depends on historical data?

Souvik_Das · March 16, 2023, 5:54am

Hi,

Is it possible to redirect one machine learning (anomaly detection) job to a new data stream having the same sets of fields of the old historical index when it is live?

Background:

We have 10 ML jobs (anomaly detection) currently running in production. We took the last 1 year of data for building the models and then make the jobs live for anomaly detection (Bucket span 4h) in real-time. Now the issue is that the index is becoming too big (50gb+) and we thought to close the index and create a data stream instead and enable ILM to it.

Now, can we redirect the datafeed to the new data stream without breaking the live job?

Will it affect the model?

Please let me know so that we can make the necessary changes to handle this large index.

Regards,
Souvik

droberts195 · March 17, 2023, 2:02pm

Yes, you can do this. What you need to do is:

Stop the datafeed: Stop datafeeds API | Elasticsearch Guide [8.11] | Elastic
While the datafeed is stopped, update it and change its indices setting: Update datafeeds API | Elasticsearch Guide [8.11] | Elastic
Start the datafeed again: Start datafeeds API | Elasticsearch Guide [8.11] | Elastic

When you restart the datafeed you can specify start=0 and it will pick up from the time it was stopped.

That depends on how different the data in the new data stream is. If the new data stream contains identical data to the old index but just in a data stream then there should be no impact whatsoever on the model. If the new data stream turns out to contain different data (either deliberately or due to a mistake) then the model could change significantly as it learns from the different data.

Souvik_Das · March 20, 2023, 10:33am

Thanks for your response.
Let me try the approach you stated.

Souvik_Das · April 9, 2023, 1:51pm

Hi @droberts195 ,
When I was discussing about this implementation with my team, one question came in our mind.

So when we change the index in the datafeed as you suggested and make the job live, post to that when the job takes a new snapshot, will it have the model parameter from both the data source ( old and new index) or it will contain parameter from the new source only?

Actually our concern is, if we change the data source, will the model drop the historical parameters when new snapshot comes?

Let me give you an example. Lets assume the model before the activity is y=ax+b where a and b are the parameters. Now if I change the datafeed to a new index, and post to that when a new model snapshot is taken by ml job i. e., y=a1x+b1, will a1 and b1 has the information of the historical a and b or it will simply come from the new index data?

Hope I am able to explain the problem statement. If not please let me know.

This is important because we will make the changes in the production elastic stack. So we need to be sure how this works.

Regards,
Souvik

richcollier · April 10, 2023, 6:00pm

It will snapshot a model that is a blend of data behaviors from the old index and the new index. As @droberts195 said, if the data is mostly consistent, it will be like no change happened. If the new data does have a significant difference in the behavior, then the blended model will, over time, drift away from being like the "old data" and more like the "new data".

Souvik_Das · April 11, 2023, 6:53am

Thanks for your clarification. It helped us to understand how the ml works in elastic.

system · May 9, 2023, 6:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[6.1.1] Machine Learning Job: "Rolling" Datafeed Period Elasticsearch elastic-stack-machine-learning	2	616	March 10, 2018
Anomaly detection job failing to pull any data in Elasticsearch	4	379	May 5, 2021
Elastic Machine Learning - Pre Built Rules Elasticsearch elastic-stack-machine-learning	5	394	June 18, 2022
ML datafeed with bursty data Elasticsearch elastic-stack-machine-learning	4	600	October 29, 2018
Machine learning jobs not reflecting new data Elasticsearch elastic-stack-machine-learning	5	838	October 30, 2018

How a elastic machine learning(anomaly detection) job depends on historical data?

Related topics