No sufficient capacity to run baai_bge_m3 model

Hi,

My ML configuration is 2 nodes with this configuration for each node :

ml.allocated_processors_double 8.0

ml.machine_memory 62.4GB

ml.config_version 12.0.0

ml.max_jvm_size 4GB

ml.allocated_processors 8

I already use model bge-large-en-v1.5 for our RAG but now I want to test baai_bge_m3 BGE-M3 — BGE documentation because this model can deal with more 512 tokens.

The upload of the model baai_bge_m3 is OK.

I stopped all runnings models so all resources are free and I tried to start the model baai_bge_m3. But I have this issue.

{
  "error": {
    "root_cause": [
      {
        "type": "status_exception",
        "reason": "Could not start deployment because no ML nodes with sufficient capacity were found"
      }
    ],
    "type": "status_exception",
    "reason": "Could not start deployment because no ML nodes with sufficient capacity were found",
    "caused_by": {
      "type": "illegal_state_exception",
      "reason": "Could not start deployment because no suitable nodes were found, allocation explanation [none]"
    }
  },
  "status": 429
}

I suspect “sufficient capacity” means no enough memory ???

When I use bge-large-en-v1 with 4 allocations/1 thread “Model size stats” is :

model_size_bytes 1.2GB
required_native_memory_bytes 10.2GB

BGE-M3 documentation says model size is 2,27 GB but I have no idea how much memory is needed to run this model on ELK.

I appreciate any help :slight_smile:

Regards,

I probably found a clue.

It seems that the memory size used by ML node is limited by default to 30% of the available memory. I will ask our Elastic team to check this parameter in our cluster…

Most likely it is due to the ML settings.

Try setting xpack.ml.use_auto_machine_memory_percent: true

Yes David, you’re right, it is probably this setting.

I’m waiting response from our elastic team that manage the cluster.