I am trying to create a job that handles a large amount of information and the recommended memory is 19 GB. I currently have 2 ml nodes with 64gb each. When executing the job with that memory, the following error appears:
Could not open job because no ML nodes with sufficient capacity were found
Any recommendation or solution to this? It would be a great help.
First and foremost, an ML job only runs on one node at a time so having multiple nodes is irrelevant in the case of a single, massive job. Secondly, by default an ML node will allocate 30% of the node's memory to ML operations (see xpack.ml.max_machine_memory_percent in the docs). So, 30% of 64GB is literally 19.2GB so you're on the hairy edge.
If you self-manage your ML nodes you can modify the above setting but I must ask what your use case is to result in a job memory demand that is that big? This sounds like you're attempting to individually model 600k+ entities all in one job. This is not a good idea and you should consider a different approach and/or breaking the data up into smaller, more manageable portions.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.