I am creating an index with semantic serach using the semantic_text field type and following this tutorial here.
I've created the inference endpoint deployment with the following configurations:
{
"inference_id": "my-elser-endpoint-v1",
"task_type": "sparse_embedding",
"service": "elser",
"service_settings": {
"num_allocations": 9,
"num_threads": 4,
"model_id": ".elser_model_2_linux-x86_64"
},
"task_settings": {}
}
My goal is to index 5 million records into my new index that supports semantic search. However, I am continuously recieving the following error either from bulk insert or from issuing the reindex command.
inference process queue is full. Unable to execute command
From what I can tell, one solution may be to increase the queue_capacity of the deployed model. Since I'm using the elser service, it seems that that configuration is abstracted from me. Is there any to set this config on the service or do I need to use a custom deployed model to achieve this level of configuration?
Hi @overflowalligator, Welcome to the Elastic community -
- Indeed increasing
queue_capacity
is one of the solution. You can hit API to start a deployment with specific value.
- The default queue capacity is
1024
. So you can send first batch of 1024 and wait for completion before sending next batch.
- To meet your numbers, may be you can scale vertically (or add more machine learning nodes) and tune
number_of_allocations
& threads_per_allocation
accordingly.
Hi ashishtiwari1993, thank you for your response. Ideally I would like to increase the queue capacity of my inference endpoint. However, I do not see how I can do this when using the semantic_text
feature and elser
service.
Below is the exact API request I am using for this and I cannot find a property to set the queue capacity on the endpoint.
PUT _inference/sparse_embedding/my-elser-endpoint-v1
{
"service": "elser",
"service_settings": {
"num_allocations": 9,
"num_threads": 4
}
}
Could you please confirm that it is possible to change the queue_capacity
on an inference endpoint that uses the elser
service?