ELSER deployments crash kibana and fail deployment

cvarano · July 1, 2023, 3:37am

Trying to test ELSER following this tutorial, but I cannot get past the very first step of deploying the ELSER model.

I have a 4GB ML node, as specified. Every time I try to deploy the ELSER model, my Kibana node crashes. This is one of the errors I see:

"Could not start deployment because no suitable nodes were found, allocation explanation [Could not assign (more) allocations on node [bWVMft10ROSAO_MInwTMPg]. Reason: This node has insufficient available memory. Available memory for ML [2369781760 (2.2gb)], free memory [236978176 (226mb)], estimated memory required for this model [2101346304 (1.9gb)].]"

Again, I'm using the recommended 4GB ML node, and 2.2GB available memory looks reasonable given that ~50% is given to the JVM. I'm not sure why free memory is 226mb, given that this a brand new cluster with literally nothing on it.

Also, why does this cause my Kibana node to go down? I've tried this with a 4GB Kibana node as well, and have ensured that the ML and Kibana instances are on different nodes.

dkyle · July 19, 2023, 10:06am

Hi @cvarano, I'm sorry you are having trouble with the ELSER tutorial.

Let's break down that error message:
Available memory for ML [2369781760 (2.2gb)] As you stated the JVM has ~50% of the 4GB of memory, this is memory available outside of the JVM for the ML processes.

free memory [236978176 (226mb)], Of that 2.2gb only 226mb is free to use. Something else is using ~2gb of memory.

estimated memory required for this model [2101346304 (1.9gb)] 1.9gb is required for the ELSER model and only 226mb is free so the model can't be deployed.

The question is of the 2.2gb available to ml what is using it? Is there another model that has been deployed on the node? Are you using another ml feature such as Anomaly Detection or Data Frame Analytics? The ML Get Memory Usage API can help explain where your memory is used.

In the Trained Models UI do you see any deployed models? If you see any deployed models stop them. The GET _ml/trained_models/_stats API will give you more detailed information about currently deployed models.

Regarding the Kibana crash what version are you using. There is a known issue where an Elasticsearch CircuitBreakerException is allowed to propagate by Kibana, it is fixed in Migrations: dynamically adjust batchSize when reading by rudolf · Pull Request #157494 · elastic/kibana · GitHub for 8.8.1. If that is the problem then increasing the size of the kibana node will not help, if anything it is better to increase the size of the ml node.

system · August 16, 2023, 10:07am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Minumum ram size for vector model ( e5, elser) Elastic Search docker , elastic-app-search	2	260	April 17, 2024
No ML nodes with sufficient capacity for trained model deployment Elasticsearch elastic-stack-machine-learning	9	965	September 19, 2024
I have trial subscription - No ML nodes exist in the cluster Elasticsearch	1	77	August 26, 2024
No ML nodes with sufficient capacity Kibana elastic-stack-machine-learning	3	747	May 24, 2022
Hi, I am trying to deploy the Elser (built-in) model, and all the configurations seems to be fine. But the deployment has started and it takes forever to complete the deployment Elasticsearch elastic-stack-machine-learning	10	1563	October 9, 2023

ELSER deployments crash kibana and fail deployment

Related topics