Timeout issue when using inference

Hi,
I am following Tutorial: semantic search with the inference API | Elasticsearch Guide [8.17] | Elastic page to setup semantic search.
My setup is using Elasticsearch v8.16.1 and all-mpnet-base-v2 model from Hugging Face.
The issue is when I search (Tutorial: semantic search with the inference API | Elasticsearch Guide [8.17] | Elastic), it throws me this error.
{"took":10001,"responses":[{"error":{"root_cause":[{"type":"status_exception","reason":"timeout [10s] waiting for inference result"}],"type":"status_exception","reason":"timeout [10s] waiting for inference result"},"status":408}]}

If I change the model to all-MiniLM-L6-v2, I don't have timeout issue.
I cannot find a way to increase the timeout.
Can show me how to change timeout settings?

Thanks

Hi @lostinroom

There isn't a way to adjust the 10 second timeout during search, the problem is tracked in this issue [ML] Associate text_expansion subsearch with parent search request · Issue #107077 · elastic/elasticsearch · GitHub

10 seconds is a long time, the timeout is probably occurring because something else has gone wrong with the model deployment. I'm not sure increasing the timeout will help in this case.

Do you see any errors in the Elasticsearch logs relating to the all-mpnet-base-v2 model. Can you share the logs please

Hi David,
Thanks for quick response.
After further investigation, it seems happened when reindexing not completed.
So when reindex runs on background, I try to query dest index and it is timeout after 10s.
Is it expected behavior?

Does it mean say reindex is completed and I query it while I am inserting new record to the index, it will be timeout as well? Cannot query during inserting?

Thanks

Take a look a the Trained Models UI in Kibana, its displays a number of stats that will be helpful for you to understand what is going on. Particularly the inference count field which you will see increasing as documents are ingested. Average inference time is another useful stat as it will help you understand the throughput.

When inserting records and querying at the same time the query requests have a higher priority and will execute before the reindex requests, (see Deploy the model in your cluster | Machine Learning in the Elastic Stack [master] | Elastic). If you are still getting timeouts consider creating two model deployments one for ingest and another for search. That will ensure your search requests are not affected by ingest work