Moving ML jobs and their data

machine-learning

(bluren) #1

I have a a few ml jobs saved and running in my local build. I would like to somehow export the same jobs on to another build that is running in a different machine. Is this possible? If so, how?

Also, assuming that I've moved my indexes also to the new build, is there a possibility of moving the ml related data (the anomalies and models) Also onto the new build?

Thanks.


(rich collier) #2

Hello,

There currently is no officially supported way to export ML job configurations, models, and results as a unit and move to a new system. There are documented enhancement requests to add this capability in the future.

The current recommended procedure is to

  1. manually, or programmatically copy ML job configurations from the old system to the new system
  2. re-train the ML models using historical data on the new system
  3. allow the new job(s) to run on-going on real-time data coming into the new system

Migrate the machine learning job from one environment to other
(bluren) #3

Thanks. I've moved some of my indices from the old system on to the new one using this tool called elatsticdump. I'm not sure what Indeces refer to the ml data.
I see the below indices. Do I copy them over as well?

.ml-anomalies
.ml-notifications

I've been lead to believe that both the training rules as well as the model is being stored in ES. Where exactly can I find them?

On another note, if I were to retrain the model, will it be identical to what was existing in the old machine? Or will there be a variance?


(rich collier) #4
  • .ml-anomalies-* are the anomaly result documents (the output of analysis)
  • .ml-notifications are the "Job messages" seen on the Job Management page of the UI
  • .ml-state is the serialized version of the job's statistical models
  • .ml-meta contains meta information like Special Calendar days (introduced in v6.2)
  • The job configurations themselves are stored in the cluster state.

Bottom line, it is not easy to just move everything over, hence my earlier comment about there not being an officially supported mechanism.


(bluren) #5

Thanks for the quick info @richcollier!


(bluren) #6

On the same subject, since I've been running the entire stack as separate docker containers, I could also just tgz the entire elastic data directory and move it over to the new system. This has worked but added to the actual data, it also has the licensing data - something which I do now want moved over. Are there specific indices for licensing as well? So that I may have them deleted prior to movement? I am currently running on trial licence.


(rich collier) #7

License information is stored in the cluster state (and is persisted to disk to path.data in the _state directory).

Safest way to manage licenses is with the API - https://www.elastic.co/guide/en/elasticsearch/reference/6.x/licensing-apis.html


(Mark Walkom) #8