I have one node dedicated for the ML tasks. When trying to run a big job, after some time the memory status turns to soft_limit, and after some more hard_limit. The ml node has enought resources, and its ram is under 40% when this hard limit is triggered.
A machine learning job has a default memory limit of 4GB. You can read more about this and how to change it here.
So, in your case, as your job needs more memory and the machine you're running it on can serve this, you can simply increase the memory limit for that job.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.