No ML nodes with sufficient capacity for trained model deployment

msola · April 16, 2024, 12:46pm

Hi,
I'm new with elasticsearch. I currently have the trial period Platinum subscription, since I want to check if elasticsearch fits with what I want to do.

I want to test the elster_model_2 ML model. However, every time that I try to deploy it I get the following error:

Could not start deployment because no ML nodes with sufficient capacity were found

{
  "statusCode": 429,
  "error": "Too Many Requests",
  "message": "[status_exception\n\tCaused by:\n\t\tillegal_state_exception: Could not start deployment because no suitable nodes were found, allocation explanation [Could not assign (more) allocations on node [iYF2E0RcS8S5D_rUdpSBXA]. Reason: This node has insufficient available memory. Available memory for ML [322122547 (307.1mb)], free memory [322122547 (307.1mb)], estimated memory required for this model [469581194 (447.8mb)].|Could not assign (more) allocations on node [rmiiO1-LTtaZVPon05N39Q]. Reason: This node has insufficient available memory. Available memory for ML [322122547 (307.1mb)], free memory [322122547 (307.1mb)], estimated memory required for this model [469581194 (447.8mb)].|Could not assign (more) allocations on node [z9GMuSTwRuGuDGtwQI8BGQ]. Reason: This node has insufficient available memory. Available memory for ML [322122547 (307.1mb)], free memory [322122547 (307.1mb)], estimated memory required for this model [469581194 (447.8mb)].]\n\tRoot causes:\n\t\tstatus_exception: Could not start deployment because no ML nodes with sufficient capacity were found]: Could not start deployment because no ML nodes with sufficient capacity were found",
  "attributes": {
    "body": {
      "error": {
        "root_cause": [
          {
            "type": "status_exception",
            "reason": "Could not start deployment because no ML nodes with sufficient capacity were found"
          }
        ],
        "type": "status_exception",
        "reason": "Could not start deployment because no ML nodes with sufficient capacity were found",
        "caused_by": {
          "type": "illegal_state_exception",
          "reason": "Could not start deployment because no suitable nodes were found, allocation explanation [Could not assign (more) allocations on node [iYF2E0RcS8S5D_rUdpSBXA]. Reason: This node has insufficient available memory. Available memory for ML [322122547 (307.1mb)], free memory [322122547 (307.1mb)], estimated memory required for this model [469581194 (447.8mb)].|Could not assign (more) allocations on node [rmiiO1-LTtaZVPon05N39Q]. Reason: This node has insufficient available memory. Available memory for ML [322122547 (307.1mb)], free memory [322122547 (307.1mb)], estimated memory required for this model [469581194 (447.8mb)].|Could not assign (more) allocations on node [z9GMuSTwRuGuDGtwQI8BGQ]. Reason: This node has insufficient available memory. Available memory for ML [322122547 (307.1mb)], free memory [322122547 (307.1mb)], estimated memory required for this model [469581194 (447.8mb)].]"
        }
      },
      "status": 429
    }
  }
}

I curently have 3 ML nodes with 1GB total memory each (512 MB estimated available memory). Am I missing something? How can I change this? Would having more RAM solve this issue (I currently only have 32 GB)?

Thank you very much!

stephenb · April 16, 2024, 2:25pm

Hi @msola Welcome to the community!.

Well, in short, you don't have enough RAM on any one of the ml nodes to support the model.

In general, it's better to scale ml nodes vertically not horizontally so that you have more RAM in any one node.

If you are RAM constrained, it's probably a better idea to make one larger node than three smaller nodes.

1 GB nodes are very small in the ml world as each model it's going to take several hundred Mbs.

msola · April 16, 2024, 4:13pm

Thanks! The nodes were automatically set to 1GB, even if I only choose one node. How can I increase the size of one single node?

Does it mean that if I use a computer with a large RAM, the nodes will automatically have a larger memory?

stephenb · April 16, 2024, 4:58pm

@msola How did you deploy your cluster?

On Prem Self Managed Nodes, Dockers? K8s? Are they in containers?

Yes Elastic will set the RAM to 50% of the available system memory assuming that we have not changed configurations.
Did you set up specific ML Nodes?
You will need to share your configs otherwise we will just be guessing and answering a lot of partial questions

slytherin_wyne · September 18, 2024, 7:37am

Hello,I got the same situation AND my cluster deployed in docker containers with 4GB RAM constrained. I got the error code when I try to test the built-in model E5.
Should I have to redeploy a cluster with 16GB at least?
OR I found some documents mention # autoscaling. Does it can solve this problem? How do i enable it ?

dkyle · September 18, 2024, 8:40am

Depending on how you deployed your cluster you may have to set xpack.ml.use_auto_machine_memory_percent to true.

See Machine learning settings in Elasticsearch | Elasticsearch Guide [8.15] | Elastic

Either set it in the elasticsearch.yml file or via the cluster settings API

PUT _cluster/settings
{
  "persistent": {
    "xpack.ml.use_auto_machine_memory_percent": "true"
  }
}

slytherin_wyne · September 18, 2024, 10:28am

thanks, I run the API code in kibana console and I got this

{
  "acknowledged": true,
  "persistent": {
    "xpack": {
      "ml": {
        "use_auto_machine_memory_percent": "true"
      }
    }
  },
  "transient": {}
}

But when i try to start deployment e5-small-multilingual ,it still get a statusCode 429.

dkyle · September 18, 2024, 2:07pm

Please can you paste the full error message here

slytherin_wyne · September 19, 2024, 1:17am

That's the full error i got

{
  "statusCode": 429,
  "error": "Too Many Requests",
  "message": "[status_exception\n\tCaused by:\n\t\tillegal_state_exception: Could not start deployment because no suitable nodes were found, allocation explanation [none]\n\tRoot causes:\n\t\tstatus_exception: Could not start deployment because no ML nodes with sufficient capacity were found]: Could not start deployment because no ML nodes with sufficient capacity were found",
  "attributes": {
    "body": {
      "error": {
        "root_cause": [
          {
            "type": "status_exception",
            "reason": "Could not start deployment because no ML nodes with sufficient capacity were found"
          }
        ],
        "type": "status_exception",
        "reason": "Could not start deployment because no ML nodes with sufficient capacity were found",
        "caused_by": {
          "type": "illegal_state_exception",
          "reason": "Could not start deployment because no suitable nodes were found, allocation explanation [none]"
        }
      },
      "status": 429
    }
  }
}

I solve it by adding another node with 16GB RAM constrained .
https://www.elastic.co/guide/en/elasticsearch/reference/8.14/docker.html#_add_more_nodes

dkyle · September 19, 2024, 9:18am

Thanks for the response @SlytherinWyne I'm glad you got this working.

In the error message I was expecting an explanation for why the model did not deploy but in your example it is none. This shouldn't happen, it's something for us to look into. Thanks the feedback.

Topic		Replies	Views
Could not open job because no ML nodes with sufficient capacity were found Elasticsearch elastic-stack-machine-learning	16	6503	October 13, 2018
Elastic cloud basic setup: Could not open job because no ML nodes with sufficient capacity were found Elasticsearch elastic-stack-machine-learning	2	2922	March 26, 2019
I have trial subscription - No ML nodes exist in the cluster Elasticsearch	1	157	August 26, 2024
CLOUD TRIAL: Could not open job because no ML nodes with sufficient capacity were found [SOLVED] Elasticsearch elastic-stack-machine-learning	3	817	June 26, 2019
Error opening machine learning Elasticsearch elastic-stack-machine-learning	7	615	April 30, 2018

No ML nodes with sufficient capacity for trained model deployment

Related topics