ML job does not fairly distribute via all ML nodes

jasony · July 10, 2019, 1:03am

hello,

i am using ML function and set 3 nodes as ML node. First i created 3 ML jobs, the cluster spread 3 jobs into all 3 nodes. but after I restarted the cluster, 3 jobs were allocated into only one node.

Is there way to manually assign ML job into specific node?

Please advise. Thank you!

BenTrent · July 10, 2019, 3:51pm

Hey @jasony

When jobs first start, they are assigned to the ML enabled node with the least load. Specifically, load related directly to ML (memory usage, number of jobs, etc.).

When you restarted your cluster, I am guessing at a certain point all the nodes running the ML jobs stopped and only one node was available for the tasks to run. Consequently, they all got reallocated to that one node.

We don't have any formal mechanisms for re-assigning jobs. The best solution would be to stop the jobs, and start them again. They should pick up where they left off and get re-assigned to the currently least-loaded node (one of the nodes without ML jobs assigned).

jasony · July 11, 2019, 1:29am

thank you for your advice. based on your advise, i stopped and started jobs then saw all jobs were re-assigned fairly.

thank you!

system · August 8, 2019, 1:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Machine learning error: persistent task is awating node assignment Elasticsearch elastic-stack-machine-learning	4	947	December 23, 2021
Less number of active ML Node in anomaly detection jobs Elasticsearch elastic-stack-machine-learning	6	624	December 8, 2020
ML node restart, job recovery Elasticsearch elastic-stack-machine-learning	3	800	August 7, 2020
No ML nodes with sufficient capacity Kibana elastic-stack-machine-learning	3	731	May 24, 2022
Resource Utilization Machine Learning Elasticsearch elastic-stack-machine-learning	8	1501	June 16, 2022

ML job does not fairly distribute via all ML nodes

Related topics