Preserving ML "Education" and Jobs

We have multiple clusters running autonomously and we've created ML jobs within a Dev environment. I am admittedly a real newbie with ML and AI, but my understanding is the algorithms are continuously updating based on analyzed data for better accuracy. I see the blue areas showing upper and lower expectations, so I believe it is learning more as time progresses, correct?

If that's the case, how can I migrate my "experienced" algorithms from the Dev environment to a Production one without restarting the training? In other words, where do the algorithms live? Along with that, I would hope the process would allow me to backup/archive existing, "educated" jobs so I don't have to start over in the event of a rebuild. The JSON for existing jobs has parameters and configuration info, but I don't see any algorithm information and certainly no history of what it has already learned.

I know I'm really new at ML so if anyone has suggestions, please let me know. Thank you!

Hi Michael,

I see the blue areas showing upper and lower expectations, so I believe it is learning more as time progresses, correct?

Correct. The model learns online thus it keeps changing as data comes in and time moves forward.

If that's the case, how can I migrate my "experienced" algorithms from the Dev environment to a Production one without restarting the training? In other words, where do the algorithms live?

ML jobs store their state in an index in the cluster. Unfortunately, if you want to migrate your jobs to a different cluster, it is not possible. You will have to start over. Hopefully, it won't take long to process historic data and catch up with real-time analysis.

Along with that, I would hope the process would allow me to backup/archive existing, "educated" jobs so I don't have to start over in the event of a rebuild. The JSON for existing jobs has parameters and configuration info, but I don't see any algorithm information and certainly no history of what it has already learned.

You will notice that a job may have a model_snapshot_id. This links the job to its current model state. We have a set of APIs that allow you to manage past model snapshots including the ability to revert to a previous one. Model snapshots are taken periodically, based on the job config parameter background_persist_interval. Also, the parameters model_snapshot_retention_days dictates how old should a snapshot be before it is automatically removed.

You can read more about job config options in here.

You can read more about the model snapshot management APIs here.

dmitri,

Thank you very much for detailed answer! That helps a lot. I'll work on tuning the model snapshots to help with the backup strategy.

Thanks again!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.