Approximate KNN, Preloading & Performance

Hi there!

I'm very excited to explore Elasticsearch as an option for both ANN and hybrid search solution. The current guides provided for tuning KNN performance have been very helpful however I feel like I'm encountering some unexpected behaviour in regards to preloading and we're also not really seeing the numbers that would resemble the benchmarks.

Here's the setup at the moment and the numbers we're seeing:

3 master nodes
20 data nodes with:

  • 29 vCPUs; Intel Xeon Ice Lake with 2.4Ghz up to 3.5Ghz boost or something. We're using Google Cloud's N2s.
  • 110GiB of memory
  • 30 GB of heap (-Xms30g -Xmx30g)
  • 500GiB of disk space

The index we've set up contains 350M~ docs+ that contain a 384 dim dense vector which is mapped as this:

      "text_embedding": {
        "type": "dense_vector",
        "dims": 384,
        "index": true,
        "similarity": "dot_product"

According to disk space usage analyzer we're seeing:


So supposedly we'd need at least 8 nodes with 70GB of RAM available for page caching in order to hold this in memory. Resource allocation shouldn't be an issue.

The KNN benchmarks are run on a single shard with 2M vectors @ 768 dimensions, so given that our vectors are half the size we assumed we could have 4M vectors per shard to have a somewhat equivalent setup.

Our current index settings are:

  "settings": {
    "index": {
      "number_of_shards": "100",
      "number_of_replicas": "1",
      "refresh_interval": "-1",
      "store.preload": ["vec", "vex", "vem", "vemf", "veq", "vemq"]

This gives us decent results once we refresh and force-merge to single segment:

P50 - 50ms~
P95 - 77ms~
P99 - 100ms~

(This is on the client side, hence ES + network)

Since we're using 8.10, we've also enabled this setting:

    "search": {
      "query_phase_parallel_collection_enabled": "true"

Our total index size is 1.8tb + one replica.

According to this our shards and force-merged segments are around 18GB.

The ANN query we're doing also has source: false and only uses fields relying inverted indices for filtering.

  • Given this information, is there anything else we could tune here?
  • Force-merging vs force-merging to a single segment has similar performance.
  • In order to have the mentioned performance we need to warm-up the cluster quite a bit and we see a lot of I/O meanwhile; preloading doesn't seem to help here. Anything we could do here?
  • Once the cluster is warmed up the I/O is almost zero during load tests; I'm assuming this is a good indicator that we have everything we need in memory?

Hi @robertasg! Recently they have released quantized HNSW graph. I have been seeing some places where people reported also reduced latencies because of that (Elastic Stack 8.12: Enhanced vector search with improvements to ES|QL and more | Elastic Blog, Introducing Scalar Quantization in Lucene — Elastic Search Labs). Besides that it can help a lot with RAM usage. I haven't been able to test it myself yet though.

Regarding the setting query_phase_parallel_collection_enabled that you have set to true. I haven't seen any documentation about that apart from some mention in the release notes: Elasticsearch version 8.10.0 | Elasticsearch Guide [8.12] | Elastic. What is your experience with it?

Oh cool. I'm very excited to see this trend where there are quite significant improvements to vector search almost every minor release! We'll definitely look into testing this on 8.12.

Regarding 8.10 and query_phase_parallel_collection_enabled - if I'm not mistaken with the data I'm looking at the improvement we saw for ANN latency was around 3x~

Where/how do you set the query_phase_parallel_collection_enabled setting? I could not find it in the documentation? Would be great to check it out.

PUT /_cluster/settings
  "persistent" : {
    "search.query_phase_parallel_collection_enabled" : true

This is what we've done. I believe it's enabled by default in later ES versions.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.