ELSER v2 model inference getting stuck while inferencing

rakesh_nayak_1 · August 8, 2024, 11:52am

Hello Team,

We need your expert advice on the issue of ELSER v2 inference getting stuck.

Elasticsearch version: 8.11.1
ML nodes: 2
Allocations are attached below

We have two separate deployment models for query and ingestion. But we have noticed the inference getting stuck. We have seen Pending Request getting increased and inference gets stuck.

At the time of the issue, we saw the ML node CPU and memory utilisation were normal but we could not find any logs that caused this. For an interim solution we have restarted the deployment and it started working. But with frequent issues in production, we want to find the root cause of this.

Hence, we require your support to know more about the observability of this. If any error happens where could we find the root cause of the error? You suggestion around this would help us.

note:

Ingestion is through Ingest pipeline to create embeddings we have onFailure step. But we noticed document is not getting ingested and keeping our ingestion in a hung state.

Query we are using text-expansion query which is timing out post 10s.

dkyle · August 8, 2024, 4:21pm

There is a known issue with Elasticsearch 8.11 where the model will freeze or stop working. The cause appeared to be the use of the IPEX library which was added in 8.11 to enhance model inference speed on Intel hardware. Unfortunately there were side effects and the library was removed in 8.12.

I recommend an upgrade to the latest version or 8.12 as a minimum, that should fix the timeouts.

rakesh_nayak_1 · August 9, 2024, 12:03am

Thanks @dkyle for your suggestion. Could you please advise us on what aspects to observe and how to implement observability for ELSER inference?

Topic		Replies	Views
Hi, I am trying to deploy the Elser (built-in) model, and all the configurations seems to be fine. But the deployment has started and it takes forever to complete the deployment Elasticsearch elastic-stack-machine-learning	10	1818	October 9, 2023
ELSER v2 model inference crashing Elasticsearch	2	201	March 3, 2025
Language identification making Elasticsearch cluster unresponsive? Elasticsearch docker , ingest-pipeline	1	379	February 11, 2022
Elastic search trained model inference not working Elasticsearch docker	5	466	January 14, 2025
Performance Limitation with ELK stack Elasticsearch	7	2893	July 6, 2017

ELSER v2 model inference getting stuck while inferencing

Related topics