ANN Search is super slow

Hello There,

Hello, I have a question regarding Elasticsearch vector search. Our vector index has 768 dimensions, and it contains 25,000,000 documents split into two indices. Here are the results from /_cat/indices:

health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   kr-documents  8bCjsGAxQfC8CPPwnTKBCg  32   0    9587847       698158    182.8gb        182.8gb
green  open   us-documents  Es86NloYSVK7xMN7XyzPZg  32   0   17183616      3021606    680.7gb        680.7gb

Here is vector property

"vector": {
    "type": "dense_vector",
    "dims": 768,
    "index": true,
    "similarity": "cosine"
}

I've executed the Python code below for vector search, but even after waiting for more than a minute, the search results are not returned, and I receive a ConnectionTimedOut error. I need to retry 6-8 times to get the search results.

try:
    es_resp: dict = ES_MODULE.search(
        index=target_index,
        query=query,
        knn={
            "field": "vector",
            "query_vector": embedding,
            "k": kwargs.get("k", 200),  # It should be larger than 200
            "num_candidates": 200,
            "similarity": 0.8,
        },
        size=size,
        source=["patent_number", "country", "vector"],
    )
except ConnectionTimeout:
    LOGGER.warning(msg={"message": f"Connection Timed out(TRIED {retry_count} / 10)"})
else:
    break

I will also provide additional information related to the nodes. Each node has 32GB of RAM.

  • Server 1
    • master
    • data_content
    • data_content
    • coordinate_only
  • Server 2
    • data_content

The reason for specifying the RAM of each node as 32GB is that I assumed the required RAM capacity is 72GB according to the formula below, and there are three data_content nodes. Therefore, I thought that each node would need 24GB.
num_vectors * 4 * (num_dimensions + 12) refer

If you could assist me with this issue, it would be greatly appreciated.

Oh each nodes is operating on Docker Container

Which version is it?

It's a very basic thing, but I forgot it. My apologies.
It's 8.8.1

I have been doing something similar lately. I have faced the same issues you are facing and found a solution to it by following the documentation on tuning for kNN performance.

A quick summary of what I am doing:

  • Roughly 90.000.000 docs,
  • Each docs has a Dense Vector of size 768,
  • 5 nodes (32vCPU, 128GB each)

A couple of things that really made a difference:

  • Make sure you have sufficient RAM (Analyze index disk usage API | Elasticsearch Guide [8.10] | Elastic). Also make sure you have some spare RAM for other processes. Also your system by default uses 50% for heap I believe. So my guess is that your 24GB per shard exceeds this threshold.
  • Use forcemerge to reduce segments (I used 2 per shard). Merging segments really helped in query speeds! Also there are downsides to having few shards, so maybe finding some balance would be good here.
  • Preloading the kNN index into memory (Preloading data into the file system cache | Elasticsearch Guide [8.9] | Elastic). This really helped a lot as well! But make sure when you do this the index can fit into your RAM.
  • When you have sufficient RAM and have set up preloading the kNN index into memory, restart the cluster and rerun your experiment. Check IO on your cluster, when doing. When the kNN index does NOT fit into memory, you will see a high read IO.

There are a couple of challenges that lie ahead when you do these things, I noticed:

  • (Heavy) indexing into this index will increase segments again making your queries slow again! I have not found a proper way to deal with this.
  • Heavy operations on the index, like expensive queries or heavy indexing on the cluster (even in another index), may push out the kNN index from memory, making it terribly slow again! Yesterday I have started a thread on this: Manage heavy indexing in kNN indexes

Hope this helps a bit!

7 Likes

As there are a lot of performance improvements, could you upgrade to 8.10.3 and test that again?

@dadoonet Thank you for your reply, i tried it. but still slow..

But, Increasing the RAM capacity to 64GB significantly alleviated the symptoms. It seems like @Thijsvdp hypothesis was correct! I will give it a try.
Appreciate That!

We are on 8.9.1, has there been a significant improvement in 8.10.3 vs 8.9.1 in terms of vector search?

What is the number of shards and the number of segments currently in your setup?

I specified 32 shards per node, and I didn't specify segments separately. Just checked, and we have 657 segments in "us-documents" and 73 segments in "kr-documents."

I saw that in the release notes: What’s new in 8.10 | Elasticsearch Guide [8.10] | Elastic

1 Like

Alright thanks. So you can still try to preload the kNN index, that may help a lot I think. And if you want to still speed up you may decrease segments, but this can have negative effects as described above.

That said, I also noticed you use cosine distance. I would recommend using dot product. If you make sure you insert normalized vectors, then they are equivalent. You would then avoid doing the normalization on each search request over and over again.

Oh that is great! I was actually waiting for this feature, but must have missed it. Thanks a lot!

It seems like you're encountering ConnectionTimedOut errors in your Elasticsearch vector search. Given your data size and setup, optimizing the Elasticsearch cluster for improved performance, possibly increasing the RAM per node, and tweaking the timeout settings might help resolve this issue. AC Football Cases.

Setting up pre-load as you suggested was incredibly helpful. A query that used to take 2-3 minutes now executes in under 1 second. You saved my ass. Thank you

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.