Elser service can't index large amount of data

overflowalligator · October 9, 2024, 1:08pm

I am creating an index with semantic serach using the semantic_text field type and following this tutorial here.

I've created the inference endpoint deployment with the following configurations:

{
  "inference_id": "my-elser-endpoint-v1",
  "task_type": "sparse_embedding",
  "service": "elser",
  "service_settings": {
    "num_allocations": 9,
    "num_threads": 4,
    "model_id": ".elser_model_2_linux-x86_64"
  },
  "task_settings": {}
}

My goal is to index 5 million records into my new index that supports semantic search. However, I am continuously recieving the following error either from bulk insert or from issuing the reindex command.

inference process queue is full. Unable to execute command

From what I can tell, one solution may be to increase the queue_capacity of the deployed model. Since I'm using the elser service, it seems that that configuration is abstracted from me. Is there any to set this config on the service or do I need to use a custom deployed model to achieve this level of configuration?

ashishtiwari1993 · October 10, 2024, 8:01am

Hi @overflowalligator, Welcome to the Elastic community -

Indeed increasing queue_capacity is one of the solution. You can hit API to start a deployment with specific value.
The default queue capacity is 1024. So you can send first batch of 1024 and wait for completion before sending next batch.
To meet your numbers, may be you can scale vertically (or add more machine learning nodes) and tune number_of_allocations & threads_per_allocation accordingly.

overflowalligator · October 10, 2024, 2:23pm

Hi ashishtiwari1993, thank you for your response. Ideally I would like to increase the queue capacity of my inference endpoint. However, I do not see how I can do this when using the semantic_text feature and elser service.

Below is the exact API request I am using for this and I cannot find a property to set the queue capacity on the endpoint.

PUT _inference/sparse_embedding/my-elser-endpoint-v1
{
  "service": "elser", 
  "service_settings": {
    "num_allocations": 9,
    "num_threads": 4
  }
}

Could you please confirm that it is possible to change the queue_capacity on an inference endpoint that uses the elser service?

system · November 7, 2024, 2:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance and Efficiency for Indexing Using Machine Learning Models Elasticsearch elastic-stack-machine-learning	3	64	July 31, 2024
Use ELSER on data already in elastic Elasticsearch elastic-stack-machine-learning	2	23	February 4, 2025
Performing semantic searches - ELSER Elasticsearch	3	252	March 30, 2024
Embedding token size limit for ELSER2 model Elastic Search elastic-workplace-search	16	1568	February 22, 2024
How can I avoid existing semantic query slowdown during re-indexing using ELSER model in Elastic Search V8 Elasticsearch	5	127	July 5, 2024

Elser service can't index large amount of data

Related topics