Elser service can't index large amount of data

I am creating an index with semantic serach using the semantic_text field type and following this tutorial here.

I've created the inference endpoint deployment with the following configurations:

{
  "inference_id": "my-elser-endpoint-v1",
  "task_type": "sparse_embedding",
  "service": "elser",
  "service_settings": {
    "num_allocations": 9,
    "num_threads": 4,
    "model_id": ".elser_model_2_linux-x86_64"
  },
  "task_settings": {}
}

My goal is to index 5 million records into my new index that supports semantic search. However, I am continuously recieving the following error either from bulk insert or from issuing the reindex command.

inference process queue is full. Unable to execute command

From what I can tell, one solution may be to increase the queue_capacity of the deployed model. Since I'm using the elser service, it seems that that configuration is abstracted from me. Is there any to set this config on the service or do I need to use a custom deployed model to achieve this level of configuration?

Hi @overflowalligator, Welcome to the Elastic community -

  1. Indeed increasing queue_capacity is one of the solution. You can hit API to start a deployment with specific value.
  2. The default queue capacity is 1024. So you can send first batch of 1024 and wait for completion before sending next batch.
  3. To meet your numbers, may be you can scale vertically (or add more machine learning nodes) and tune number_of_allocations & threads_per_allocation accordingly.

Hi ashishtiwari1993, thank you for your response. Ideally I would like to increase the queue capacity of my inference endpoint. However, I do not see how I can do this when using the semantic_text feature and elser service.

Below is the exact API request I am using for this and I cannot find a property to set the queue capacity on the endpoint.

PUT _inference/sparse_embedding/my-elser-endpoint-v1
{
  "service": "elser", 
  "service_settings": {
    "num_allocations": 9,
    "num_threads": 4
  }
}

Could you please confirm that it is possible to change the queue_capacity on an inference endpoint that uses the elser service?