Hi,
Iโm currently using KNN search in Elasticsearch (version 8.12) with the following setup:
- Query vector dimension: 384
- Index size: 200M+ documents
- k = 100, num_candidates = 200
- Quantized vectors using 'int8_hnsw' instead of float
- Preloaded vector fields: 'vex' and 'veq'
I referred to this documentation for optimization:
Tune approximate kNN search | Elastic Docs
The main issue arises when I apply filters in the KNN query โ the performance significantly degrades, and in some cases, I encounter response timeout errors (>60 seconds).
Is there any way to improve the performance of KNN search when using filters on such a large dataset? Any tuning recommendations, settings, or known limitations I should consider?
Thanks in advance!
Hey there @Saleh_AbuAli ,
It would be interesting to see more about specifically what filters you are applying, and how many documents they are matching. Also profiling the query could be interesting.
Without more information about your queries, I can offer some general advice:
- Is the size of your results page 100? If not, consider decreasing
k
to something smaller.
- Consider upgrading to a newer version of Elasticsearch. Since 8.12 there have been several performance optimizations that speed up performance of vector search.
2 Likes
Hi @Kathleen_DeRusso , Thank you for your response.
The filters are restrictive, but the filtered documents still number more than 500,000.
I apply some custom filtering on the returned results based on the score โ that's why I set K to 100. After applying my custom filtration, I usually keep only 10 documents. Reducing K might lower the number of relevant results returned.
Number of shards =40 and total number of segments =181.
Could you please let me know what optimizations in the newer version help improve the performance of vector search?
Example filter:
"filter": [ {
"range": {
"publicationYear": {
"gte": "2010",
"lte": "2022"
}
}
},
{
"bool": {
"should": [
{
"terms": {
"HostedInVenue.Venue.PublishedByInstitution.Institution.@id.pkg": [
"112321dssa"
]
}
}
]
}
} ```
This blog is a bit outdated since it talks about 8.15.0 and we just released 8.18.0/9.0.0, but it talks about some of the enhancements we made. Since then we have also introduced and GA'd BBQ quantization as well as several other optimizations in Lucene and Elasticsearch.
Thanks for your help. I'm currently using int8_hnsw quantization. Are there any recommended steps or any advice on what to add or change in the mapping or anything else?
the mapping for the vector is
"@vector": {
"type": "dense_vector",
"dims": 384,
"index": true,
"similarity": "cosine",
"index_options": {
"type": "int8_hnsw",
"m": 16,
"ef_construction": 100
}
}
Actually, a colleague pointed out to me that 40 shards is likely a culprit. For that many vectors you probably need a lot of RAM (maybe 80 GB)? How much RAM do you have?
Also, if you need to save space on your existing deployment you could consider switching to int4_hnsw
which will halve the RAM requirements.
These are the details for the nodes I have, with information for each of them. Please note that I have another index on the same nodes. Can you please let me know if this amount of RAM is sufficient, or if I need to increase it?
Node Name |
Role |
Total RAM (GB) |
Used RAM (GB) |
Used % |
Heap Used % |
JVM Heap (GB) |
es-data-2 |
Data |
66.57 |
60.12 |
90% |
36% |
34.36 |
es-master-0 |
Master |
4.00 |
2.99 |
70% |
45% |
2.15 |
es-master-1 |
Master |
4.00 |
2.85 |
66% |
57% |
2.15 |
es-data-5 |
Data |
66.57 |
60.31 |
91% |
70% |
34.36 |
es-data-3 |
Data |
66.57 |
60.37 |
91% |
75% |
34.36 |
es-data-0 |
Data |
66.57 |
60.27 |
91% |
43% |
34.36 |
es-master-2 |
Master |
4.00 |
2.79 |
65% |
35% |
2.15 |
es-data-1 |
Data |
66.57 |
60.37 |
91% |
57% |
34.36 |
es-data-4 |
Data |
66.57 |
60.36 |
91% |
39% |
34.36 |
Document Distribution :
Node Name |
Total Documents |
es-data-0 |
38,274,701 |
es-data-1 |
0 |
es-data-2 |
38,457,734 |
es-data-3 |
38,351,625 |
es-data-4 |
38,342,771 |
es-data-5 |
38,401,115 |
Here's a good reference on sizing.
Remember that for HNSW, we have to store everything in memory, so the back of the envelope formula is num_vectors * (num_dimensions + 4)
- that's without replicas and also by itself without other indices on the same nodes. So by itself, ballpark around 80 GB. Assuming that your data nodes here are only 1 replica, that might be sufficient on its own but it really depends on what else is there besides the vectors. The analyze disk usage API could also be helpful here.
Beyond that it would be interesting to know what type of filters you're sending in that result in timeouts.
Thanks @Kathleen_DeRusso I tried to run the disk_usage
query, but I got a timeout error every time, and there's no way to control the timeout value.
Here are sample of the filters that I applied :
{"knn":[{"field":"@vector","query_vectorโ:[queryvector,]โkโ:100,"num_candidates":200,
"filter"
:[{"range":{"publicationYear":{"gte":"2010","lte":"2022"}}}],
{"bool":
{"must_not":[{"terms":{"HostedIn.Venue.@id":["data1","data2"]}}],
{"should":[{"terms":{"HostedIn.Venue.Publish.Institution.@id":["data3","data4"]}},{"terms":{"HostedIn.Venue.@id":["data6"]}}]}}]}]
}```
Nothing stands out to me as extremely expensive in the filters - if I had seen something like function_score
in the filter I would say it needs to be optimized, but you're only doing terms
and range
filters. I suppose you could experiment and see if one of them is the root cause of the timeout (IDK, maybe caching or something at play there if your documents are frequently changing and these filters are hitting a ton of documents)..
You're not doing other expensive query operations like function score queries or aggs when you're timing out are you?
You could also look at how many segments you're searching and whether a force merge helps.
You could try either increasing memory/nodes or going down to int4 quantization if you feel memory might be an issue based on the above information.
Also, going back to the smaller k
that would definitely be faster, especially if your application didn't have to do custom post-filtering.
Just a lurker on this thread, with a smaller data set
โ If RAM weren't so much a constraint, what sort of performance benefit might we expect from using dot_product
instead of cosine
similarity, provided we normalize our vectors?
Newer versions of Elasticsearch (8.12+) will do that normalization for you under the hood! You can find out more in the PR.