No sufficient capacity to run baai_bge_m3 model

gueri · October 14, 2025, 10:02am

Hi,

My ML configuration is 2 nodes with this configuration for each node :

ml.allocated_processors_double 8.0

ml.machine_memory 62.4GB

ml.config_version 12.0.0

ml.max_jvm_size 4GB

ml.allocated_processors 8

I already use model bge-large-en-v1.5 for our RAG but now I want to test baai_bge_m3 BGE-M3 — BGE documentation because this model can deal with more 512 tokens.

The upload of the model baai_bge_m3 is OK.

I stopped all runnings models so all resources are free and I tried to start the model baai_bge_m3. But I have this issue.

{
  "error": {
    "root_cause": [
      {
        "type": "status_exception",
        "reason": "Could not start deployment because no ML nodes with sufficient capacity were found"
      }
    ],
    "type": "status_exception",
    "reason": "Could not start deployment because no ML nodes with sufficient capacity were found",
    "caused_by": {
      "type": "illegal_state_exception",
      "reason": "Could not start deployment because no suitable nodes were found, allocation explanation [none]"
    }
  },
  "status": 429
}

I suspect “sufficient capacity” means no enough memory ???

When I use bge-large-en-v1 with 4 allocations/1 thread “Model size stats” is :

model_size_bytes 1.2GB
required_native_memory_bytes 10.2GB

BGE-M3 documentation says model size is 2,27 GB but I have no idea how much memory is needed to run this model on ELK.

I appreciate any help

Regards,

gueri · October 14, 2025, 1:19pm

I probably found a clue.

It seems that the memory size used by ML node is limited by default to 30% of the available memory. I will ask our Elastic team to check this parameter in our cluster…

dkyle · October 14, 2025, 2:24pm

Most likely it is due to the ML settings.

Try setting xpack.ml.use_auto_machine_memory_percent: true

gueri · October 14, 2025, 3:19pm

Yes David, you’re right, it is probably this setting.

I’m waiting response from our elastic team that manage the cluster.

gueri · November 3, 2025, 1:37pm

There you go, it’s done! The option has been changed on our cluster and I can now load the model. However, I see that this will raise another problem!

Our servers have 64 GB of memory. With a single allocation and a single thread, the model consumes practically all the memory of a ML node. In production, I can’t be satisfied with a single allocation. The memory size of my ML nodes will be far insufficient! I will need a less demanding model with at least 2,000 tokens… but does such an embeddings model exist ?

Topic		Replies	Views
Unable to load a trained model on ML node with sufficient memory Elasticsearch elastic-stack-machine-learning	0	30	November 5, 2025
Could not open job because no ML nodes with sufficient capacity were found Elasticsearch elastic-stack-machine-learning	16	6522	October 13, 2018
Elastic cloud basic setup: Could not open job because no ML nodes with sufficient capacity were found Elasticsearch elastic-stack-machine-learning	2	2923	March 26, 2019
No ML nodes with sufficient capacity for trained model deployment Elasticsearch elastic-stack-machine-learning	9	1500	September 19, 2024
CLOUD TRIAL: Could not open job because no ML nodes with sufficient capacity were found [SOLVED] Elasticsearch elastic-stack-machine-learning	3	820	June 26, 2019

No sufficient capacity to run baai_bge_m3 model

Related topics