ML Inference speeds &

Hello,

I have a question regarding the speed of which I embed my documents.

I currently have an index with 10.000 documents,
My ML node looks like this:


As can be seen, I currently have 4GB of ram & 2vCPU's.

With this setup I embedded the 10.000 documents in 6.5 hours.

My question is; What if I increase the RAM to 16GB which would also increase the vCPU's to 8 vCPUs.
How much faster would it be?
Is it possible to calculate this?
Is it possible to use the same calculation for an even higher upgrade?

Kr, Chenko

Hi!
There are a few variables you can play with to influence the speed/performance.
A general rule of thumb is that it's better to first scale your ML node vertically and make sure to give it enough RAM for your task (or setting up autoscaling).

You can then look at the number of threads and allocations and how changing those up can influence the performance within your available infrastructure. You can play with this with a small dataset and monitor your CPU usage and the speed of processing until you find the best settings for your use case.

Here's a small example in this blog about how allocation strategies can influence the inference time for a model. (and what commands you can use to observe this)

Hope this helps as a starting point!

1 Like