No sufficient capacity to run baai_bge_m3 model

Hi,

My ML configuration is 2 nodes with this configuration for each node :

ml.allocated_processors_double 8.0

ml.machine_memory 62.4GB

ml.config_version 12.0.0

ml.max_jvm_size 4GB

ml.allocated_processors 8

I already use model bge-large-en-v1.5 for our RAG but now I want to test baai_bge_m3 BGE-M3 — BGE documentation because this model can deal with more 512 tokens.

The upload of the model baai_bge_m3 is OK.

I stopped all runnings models so all resources are free and I tried to start the model baai_bge_m3. But I have this issue.

{
  "error": {
    "root_cause": [
      {
        "type": "status_exception",
        "reason": "Could not start deployment because no ML nodes with sufficient capacity were found"
      }
    ],
    "type": "status_exception",
    "reason": "Could not start deployment because no ML nodes with sufficient capacity were found",
    "caused_by": {
      "type": "illegal_state_exception",
      "reason": "Could not start deployment because no suitable nodes were found, allocation explanation [none]"
    }
  },
  "status": 429
}

I suspect “sufficient capacity” means no enough memory ???

When I use bge-large-en-v1 with 4 allocations/1 thread “Model size stats” is :

model_size_bytes 1.2GB
required_native_memory_bytes 10.2GB

BGE-M3 documentation says model size is 2,27 GB but I have no idea how much memory is needed to run this model on ELK.

I appreciate any help :slight_smile:

Regards,

I probably found a clue.

It seems that the memory size used by ML node is limited by default to 30% of the available memory. I will ask our Elastic team to check this parameter in our cluster…

Most likely it is due to the ML settings.

Try setting xpack.ml.use_auto_machine_memory_percent: true

Yes David, you’re right, it is probably this setting.

I’m waiting response from our elastic team that manage the cluster.

There you go, it’s done! The option has been changed on our cluster and I can now load the model. However, I see that this will raise another problem!

Our servers have 64 GB of memory. With a single allocation and a single thread, the model consumes practically all the memory of a ML node. In production, I can’t be satisfied with a single allocation. The memory size of my ML nodes will be far insufficient! I will need a less demanding model with at least 2,000 tokens… but does such an embeddings model exist ?