The linked chart shows how ingest speed changes with the number of allocations on a 16vCPU machine. The figure of 26docs per sec was observed in that particular setup for a single ml node, it is not an absolute limit. In that example adding another 16vCPU ml node and doubling the number of allocations would increase the input rate to 52docs per sec.
Are you limit to 26docs per sec even if you add more ml nodes to the cluster and add more allocations to the deployment?
To optimise ingest throughput use as many allocations with 1 thread as you can. In your example where your ml node has 32 processors deploy 31 * 1 allocations saving one processor core to run Elasticsearch.
If ingest is too slow scale up by adding more ml nodes, you can remove them later once your documents have been ingested. If you double the number of nodes you can double the ingest rate.
Problem: Our ingest time has doubled and takes around ~35 mins compared to the earlier 15 mins.
What is the baseline measure? Did ingest take 15 mins without the ELSER model and 35 mins with ELSER inference?
We use 2 ML nodes with 28 * 2 allocation (split into 14*2 in each node).
Try using 56 * 1 allocations (28 * 1 on each node)
It looks like the deployment is stuck and it appears to have stopped processing after a small number of docs (28 inference requests on the first node, 18 on the second, 33 on the third). Can you reproduce the failure with a small dataset and is it possible for you to share a sample of your data with me?
I cannot reproduce the failure in Elastic cloud, I suspect the error may be data dependent so it would be very helpful if you could share a small reproducible dataset.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.