No ML nodes with sufficient capacity

Hello !

I am trying to create a job that handles a large amount of information and the recommended memory is 19 GB. I currently have 2 ml nodes with 64gb each. When executing the job with that memory, the following error appears:

Could not open job because no ML nodes with sufficient capacity were found

Any recommendation or solution to this? It would be a great help.

Thanks in advanced!

First and foremost, an ML job only runs on one node at a time so having multiple nodes is irrelevant in the case of a single, massive job. Secondly, by default an ML node will allocate 30% of the node's memory to ML operations (see in the docs). So, 30% of 64GB is literally 19.2GB so you're on the hairy edge.

If you self-manage your ML nodes you can modify the above setting but I must ask what your use case is to result in a job memory demand that is that big? This sounds like you're attempting to individually model 600k+ entities all in one job. This is not a good idea and you should consider a different approach and/or breaking the data up into smaller, more manageable portions.


I was finally able to start the job with the changes you told me but now i have some job with the next warning :


i stop, close and reset the jobs but it keeps coming out.

any solution ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.