No ML nodes with sufficient capacity for trained model deployment

Hi,
I'm new with elasticsearch. I currently have the trial period Platinum subscription, since I want to check if elasticsearch fits with what I want to do.

I want to test the elster_model_2 ML model. However, every time that I try to deploy it I get the following error:

Could not start deployment because no ML nodes with sufficient capacity were found

{
  "statusCode": 429,
  "error": "Too Many Requests",
  "message": "[status_exception\n\tCaused by:\n\t\tillegal_state_exception: Could not start deployment because no suitable nodes were found, allocation explanation [Could not assign (more) allocations on node [iYF2E0RcS8S5D_rUdpSBXA]. Reason: This node has insufficient available memory. Available memory for ML [322122547 (307.1mb)], free memory [322122547 (307.1mb)], estimated memory required for this model [469581194 (447.8mb)].|Could not assign (more) allocations on node [rmiiO1-LTtaZVPon05N39Q]. Reason: This node has insufficient available memory. Available memory for ML [322122547 (307.1mb)], free memory [322122547 (307.1mb)], estimated memory required for this model [469581194 (447.8mb)].|Could not assign (more) allocations on node [z9GMuSTwRuGuDGtwQI8BGQ]. Reason: This node has insufficient available memory. Available memory for ML [322122547 (307.1mb)], free memory [322122547 (307.1mb)], estimated memory required for this model [469581194 (447.8mb)].]\n\tRoot causes:\n\t\tstatus_exception: Could not start deployment because no ML nodes with sufficient capacity were found]: Could not start deployment because no ML nodes with sufficient capacity were found",
  "attributes": {
    "body": {
      "error": {
        "root_cause": [
          {
            "type": "status_exception",
            "reason": "Could not start deployment because no ML nodes with sufficient capacity were found"
          }
        ],
        "type": "status_exception",
        "reason": "Could not start deployment because no ML nodes with sufficient capacity were found",
        "caused_by": {
          "type": "illegal_state_exception",
          "reason": "Could not start deployment because no suitable nodes were found, allocation explanation [Could not assign (more) allocations on node [iYF2E0RcS8S5D_rUdpSBXA]. Reason: This node has insufficient available memory. Available memory for ML [322122547 (307.1mb)], free memory [322122547 (307.1mb)], estimated memory required for this model [469581194 (447.8mb)].|Could not assign (more) allocations on node [rmiiO1-LTtaZVPon05N39Q]. Reason: This node has insufficient available memory. Available memory for ML [322122547 (307.1mb)], free memory [322122547 (307.1mb)], estimated memory required for this model [469581194 (447.8mb)].|Could not assign (more) allocations on node [z9GMuSTwRuGuDGtwQI8BGQ]. Reason: This node has insufficient available memory. Available memory for ML [322122547 (307.1mb)], free memory [322122547 (307.1mb)], estimated memory required for this model [469581194 (447.8mb)].]"
        }
      },
      "status": 429
    }
  }
}

I curently have 3 ML nodes with 1GB total memory each (512 MB estimated available memory). Am I missing something? How can I change this? Would having more RAM solve this issue (I currently only have 32 GB)?

Thank you very much!

Hi @msola Welcome to the community!.

Well, in short, you don't have enough RAM on any one of the ml nodes to support the model.

In general, it's better to scale ml nodes vertically not horizontally so that you have more RAM in any one node.

If you are RAM constrained, it's probably a better idea to make one larger node than three smaller nodes.

1 GB nodes are very small in the ml world as each model it's going to take several hundred Mbs.

Thanks! The nodes were automatically set to 1GB, even if I only choose one node. How can I increase the size of one single node?

Does it mean that if I use a computer with a large RAM, the nodes will automatically have a larger memory?

@msola How did you deploy your cluster?

On Prem Self Managed Nodes, Dockers? K8s? Are they in containers?

Yes Elastic will set the RAM to 50% of the available system memory assuming that we have not changed configurations.
Did you set up specific ML Nodes?
You will need to share your configs otherwise we will just be guessing and answering a lot of partial questions :slight_smile:

1 Like

Hello,I got the same situation AND my cluster deployed in docker containers with 4GB RAM constrained. I got the error code when I try to test the built-in model E5.
Should I have to redeploy a cluster with 16GB at least?
OR I found some documents mention # autoscaling. Does it can solve this problem? How do i enable it ?

Depending on how you deployed your cluster you may have to set xpack.ml.use_auto_machine_memory_percent to true.

See Machine learning settings in Elasticsearch | Elasticsearch Guide [8.15] | Elastic

Either set it in the elasticsearch.yml file or via the cluster settings API

PUT _cluster/settings
{
  "persistent": {
    "xpack.ml.use_auto_machine_memory_percent": "true"
  }
}
1 Like

thanks, I run the API code in kibana console and I got this

{
  "acknowledged": true,
  "persistent": {
    "xpack": {
      "ml": {
        "use_auto_machine_memory_percent": "true"
      }
    }
  },
  "transient": {}
}

But when i try to start deployment e5-small-multilingual ,it still get a statusCode 429.

Please can you paste the full error message here

That's the full error i got

{
  "statusCode": 429,
  "error": "Too Many Requests",
  "message": "[status_exception\n\tCaused by:\n\t\tillegal_state_exception: Could not start deployment because no suitable nodes were found, allocation explanation [none]\n\tRoot causes:\n\t\tstatus_exception: Could not start deployment because no ML nodes with sufficient capacity were found]: Could not start deployment because no ML nodes with sufficient capacity were found",
  "attributes": {
    "body": {
      "error": {
        "root_cause": [
          {
            "type": "status_exception",
            "reason": "Could not start deployment because no ML nodes with sufficient capacity were found"
          }
        ],
        "type": "status_exception",
        "reason": "Could not start deployment because no ML nodes with sufficient capacity were found",
        "caused_by": {
          "type": "illegal_state_exception",
          "reason": "Could not start deployment because no suitable nodes were found, allocation explanation [none]"
        }
      },
      "status": 429
    }
  }
}

I solve it by adding another node with 16GB RAM constrained .
https://www.elastic.co/guide/en/elasticsearch/reference/8.14/docker.html#_add_more_nodes

Thanks for the response @SlytherinWyne I'm glad you got this working.

In the error message I was expecting an explanation for why the model did not deploy but in your example it is none. This shouldn't happen, it's something for us to look into. Thanks the feedback.