Hardware recommendation for vector search

Hi;
At the moment I have a cluster with 8 nodes, 2TB RAM and 5TB SSD disk.
but after indexing vectors (1.2TB for now and will increase by time), search on it took 30 seconds.
I've tried quantization , add more nodes and add more shards and after that response time reduced to 10 seconds.
I even increase jvm heap which after that i noticed I shouldn't do that.
Now i faced with Hot-Warm-Cold architecture, I think It's not good idea to change data nodes to hot nodes, But seems it's the only way i have to try.
Is there any recommendation? increase jvm is right in this position? increase hardware resources will help? and how should i manage them right? or use hot architecture seems good?

Which version of Elasticsearch are you using? If not on the latest I would recommend upgrading as this area moves fast and is contunously being improved.

How many indices and shards is your data spread across? What is the average shard size?

Is that the RAM and storage per host or in total? Is the 1.2TB just the primary shard size or does it include replicas? If so, how many replicas?

A hot-warm-cold architecture is generally built on the assumption that newer data is queried more frequently and that it is acceptable with longer latencies when querying older data. If query requirements do not vary based on data age for your use case I would stay away from this type of architecture.

@Christian_Dahlqvist thank you for your response.

It's 8.15.3

It's just one index, 25 shards with average size 40.

In total I have 2TB RAM. At this moment Per Host it's around 200 GB.
1.2TB just the primary shard. I didn't use replica.

So it's not a good idea because my data is not time-based.

Have you looked at these guidelines?

@Christian_Dahlqvist

Thank you for your guidance. I reviewed these guidelines and implemented some of them.

From your last reply, I deduce that "partitioning" would be helpful. Is that correct? In general, is partitioning a good idea for large indices?

What do you mean by this?

split my vector index into two indices for example. is it wisely to do this for my vector index?

There are optimisations you can make based on the structure of your data and how you query it, but as you have not described anything about your data or query patterns it is impossible to say.

Why would this improve performance? How would the data be split?

As I know nothing about the structure of the data nor how it is queried I can not tell whether this makes sense or not.

You have also not told us anything about what your performance target is, just that 30 second latency is too high.

1 Like