My goal is to index 5 million records into my new index that supports semantic search. However, I am continuously recieving the following error either from bulk insert or from issuing the reindex command.
inference process queue is full. Unable to execute command
From what I can tell, one solution may be to increase the queue_capacity of the deployed model. Since I'm using the elser service, it seems that that configuration is abstracted from me. Is there any to set this config on the service or do I need to use a custom deployed model to achieve this level of configuration?
Indeed increasing queue_capacity is one of the solution. You can hit API to start a deployment with specific value.
The default queue capacity is 1024. So you can send first batch of 1024 and wait for completion before sending next batch.
To meet your numbers, may be you can scale vertically (or add more machine learning nodes) and tune number_of_allocations & threads_per_allocation accordingly.
Hi ashishtiwari1993, thank you for your response. Ideally I would like to increase the queue capacity of my inference endpoint. However, I do not see how I can do this when using the semantic_text feature and elser service.
Below is the exact API request I am using for this and I cannot find a property to set the queue capacity on the endpoint.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.