i am using ML function and set 3 nodes as ML node. First i created 3 ML jobs, the cluster spread 3 jobs into all 3 nodes. but after I restarted the cluster, 3 jobs were allocated into only one node.
Is there way to manually assign ML job into specific node?
When jobs first start, they are assigned to the ML enabled node with the least load. Specifically, load related directly to ML (memory usage, number of jobs, etc.).
When you restarted your cluster, I am guessing at a certain point all the nodes running the ML jobs stopped and only one node was available for the tasks to run. Consequently, they all got reallocated to that one node.
We don't have any formal mechanisms for re-assigning jobs. The best solution would be to stop the jobs, and start them again. They should pick up where they left off and get re-assigned to the currently least-loaded node (one of the nodes without ML jobs assigned).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.