How to improve ELSERv2 ingest throughput?

Hello Team,

We need your guidance on "Improving ELSERv2 optimized model ingest throughput".

As per the link we can't ingest more than 26docs per sec no matter what the allocations are.

A little background on our ingest pipeline,

  • Dedicated deployment for ingest pipeline
  • Pipeline has 6 inference processors to create 6 different embeddings on the fields for each document.
  • Each field contains less than 256 tokens.
  • We use 2 ML nodes with 28 * 2 allocation (split into 14*2 in each node).

No of documents: ~40K
Elastic Version: 8.11.1

Problem: Our ingest time has doubled and takes around ~35 mins compared to the earlier 15 mins.

Could you please recommend what we can do to bring back the ingest time close to 15 minutes?

ML node configuration

The linked chart shows how ingest speed changes with the number of allocations on a 16vCPU machine. The figure of 26docs per sec was observed in that particular setup for a single ml node, it is not an absolute limit. In that example adding another 16vCPU ml node and doubling the number of allocations would increase the input rate to 52docs per sec.

Are you limit to 26docs per sec even if you add more ml nodes to the cluster and add more allocations to the deployment?

To optimise ingest throughput use as many allocations with 1 thread as you can. In your example where your ml node has 32 processors deploy 31 * 1 allocations saving one processor core to run Elasticsearch.

If ingest is too slow scale up by adding more ml nodes, you can remove them later once your documents have been ingested. If you double the number of nodes you can double the ingest rate.

Problem: Our ingest time has doubled and takes around ~35 mins compared to the earlier 15 mins.

What is the baseline measure? Did ingest take 15 mins without the ELSER model and 35 mins with ELSER inference?

  • We use 2 ML nodes with 28 * 2 allocation (split into 14*2 in each node).

Try using 56 * 1 allocations (28 * 1 on each node)

Thanks @dkyle for the suggestion.

We have tried the suggestion with 3 nodes with a new allocation 48*1. We see more docs remain pending (after refreshing the page).

I have re-deployed the deployment and it's the same issue.

Could you please suggest what could cause this? Also please enlighten how could we add observability on what's going inside ingest pipeline

It looks like the deployment is stuck and it appears to have stopped processing after a small number of docs (28 inference requests on the first node, 18 on the second, 33 on the third). Can you reproduce the failure with a small dataset and is it possible for you to share a sample of your data with me?

I cannot reproduce the failure in Elastic cloud, I suspect the error may be data dependent so it would be very helpful if you could share a small reproducible dataset.

Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.