Hello,
I cannot find any relevant information in the docs, so I decided to ask here.
Situation:
We have real-time ML job on specific indexes.
We are using daily indexes and our oldest index is two years old.
What if we will delete all indexes from 2017? Will be our ML model bounds more inaccurate then?
Are they old indexes needed for ML jobs which are already setup?
I assume when you say "What if we will delete all indexes from 2017" you mean the indices that contain the raw data that the ML job is analyzing? If that's the case, then yes - you are perfectly safe to delete those indices - as the ML job's model does not need to look at that data again. Data is analyzed by ML in chronological order and is only viewed once. The cumulative model of behavior for that data is stored in the .ml-state index.
The ONLY downside of deleting the old data is that any newly created jobs can only learn from the historical data that you do have. But, if this data is very old (as you say, from 2017) it is less useful for that purpose anyway.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.