I am trying to create a job that handles a large amount of information and the recommended memory is 19 GB. I currently have 2 ml nodes with 64gb each. When executing the job with that memory, the following error appears:
Could not open job because no ML nodes with sufficient capacity were found
Any recommendation or solution to this? It would be a great help.
Thanks in advanced!
First and foremost, an ML job only runs on one node at a time so having multiple nodes is irrelevant in the case of a single, massive job. Secondly, by default an ML node will allocate 30% of the node's memory to ML operations (see
xpack.ml.max_machine_memory_percent in the docs). So, 30% of 64GB is literally 19.2GB so you're on the hairy edge.
If you self-manage your ML nodes you can modify the above setting but I must ask what your use case is to result in a job memory demand that is that big? This sounds like you're attempting to individually model 600k+ entities all in one job. This is not a good idea and you should consider a different approach and/or breaking the data up into smaller, more manageable portions.
I was finally able to start the job with the changes you told me but now i have some job with the next warning :
i stop, close and reset the jobs but it keeps coming out.
any solution ?