ELSER deployments crash kibana and fail deployment

Trying to test ELSER following this tutorial, but I cannot get past the very first step of deploying the ELSER model.

I have a 4GB ML node, as specified. Every time I try to deploy the ELSER model, my Kibana node crashes. This is one of the errors I see:

"Could not start deployment because no suitable nodes were found, allocation explanation [Could not assign (more) allocations on node [bWVMft10ROSAO_MInwTMPg]. Reason: This node has insufficient available memory. Available memory for ML [2369781760 (2.2gb)], free memory [236978176 (226mb)], estimated memory required for this model [2101346304 (1.9gb)].]"

Again, I'm using the recommended 4GB ML node, and 2.2GB available memory looks reasonable given that ~50% is given to the JVM. I'm not sure why free memory is 226mb, given that this a brand new cluster with literally nothing on it.

Also, why does this cause my Kibana node to go down? I've tried this with a 4GB Kibana node as well, and have ensured that the ML and Kibana instances are on different nodes.

Hi @cvarano, I'm sorry you are having trouble with the ELSER tutorial.

Let's break down that error message:
Available memory for ML [2369781760 (2.2gb)] As you stated the JVM has ~50% of the 4GB of memory, this is memory available outside of the JVM for the ML processes.

free memory [236978176 (226mb)], Of that 2.2gb only 226mb is free to use. Something else is using ~2gb of memory.

estimated memory required for this model [2101346304 (1.9gb)] 1.9gb is required for the ELSER model and only 226mb is free so the model can't be deployed.

The question is of the 2.2gb available to ml what is using it? Is there another model that has been deployed on the node? Are you using another ml feature such as Anomaly Detection or Data Frame Analytics? The ML Get Memory Usage API can help explain where your memory is used.

In the Trained Models UI do you see any deployed models? If you see any deployed models stop them. The GET _ml/trained_models/_stats API will give you more detailed information about currently deployed models.

Regarding the Kibana crash what version are you using. There is a known issue where an Elasticsearch CircuitBreakerException is allowed to propagate by Kibana, it is fixed in Migrations: dynamically adjust batchSize when reading by rudolf · Pull Request #157494 · elastic/kibana · GitHub for 8.8.1. If that is the problem then increasing the size of the kibana node will not help, if anything it is better to increase the size of the ml node.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.