Performance Issue with KNN + Filter on Large Index (v8.12)

Saleh_AbuAli · April 18, 2025, 8:44am

Hi,

I’m currently using KNN search in Elasticsearch (version 8.12) with the following setup:

Query vector dimension: 384
Index size: 200M+ documents
k = 100, num_candidates = 200
Quantized vectors using 'int8_hnsw' instead of float
Preloaded vector fields: 'vex' and 'veq'

I referred to this documentation for optimization:
Tune approximate kNN search | Elastic Docs

The main issue arises when I apply filters in the KNN query — the performance significantly degrades, and in some cases, I encounter response timeout errors (>60 seconds).

Is there any way to improve the performance of KNN search when using filters on such a large dataset? Any tuning recommendations, settings, or known limitations I should consider?

Thanks in advance!

Kathleen_DeRusso · April 18, 2025, 12:18pm

Hey there @Saleh_AbuAli ,

It would be interesting to see more about specifically what filters you are applying, and how many documents they are matching. Also profiling the query could be interesting.

Without more information about your queries, I can offer some general advice:

Is the size of your results page 100? If not, consider decreasing k to something smaller.
Consider upgrading to a newer version of Elasticsearch. Since 8.12 there have been several performance optimizations that speed up performance of vector search.

Saleh_AbuAli · April 18, 2025, 12:33pm

Hi @Kathleen_DeRusso , Thank you for your response.
The filters are restrictive, but the filtered documents still number more than 500,000.
I apply some custom filtering on the returned results based on the score — that's why I set K to 100. After applying my custom filtration, I usually keep only 10 documents. Reducing K might lower the number of relevant results returned.
Number of shards =40 and total number of segments =181.
Could you please let me know what optimizations in the newer version help improve the performance of vector search?
Example filter:

 "filter": [   {
        "range": {
          "publicationYear": {
            "gte": "2010",
            "lte": "2022"
          }
        }
      },
      {
        "bool": {
          "should": [
            {
              "terms": {
                "HostedInVenue.Venue.PublishedByInstitution.Institution.@id.pkg": [
                  "112321dssa"
                ]
              }
            }
          ]
        }
      } ```

Kathleen_DeRusso · April 18, 2025, 12:54pm

This blog is a bit outdated since it talks about 8.15.0 and we just released 8.18.0/9.0.0, but it talks about some of the enhancements we made. Since then we have also introduced and GA'd BBQ quantization as well as several other optimizations in Lucene and Elasticsearch.

Saleh_AbuAli · April 18, 2025, 1:07pm

Thanks for your help. I'm currently using int8_hnsw quantization. Are there any recommended steps or any advice on what to add or change in the mapping or anything else?
the mapping for the vector is
"@vector": {
"type": "dense_vector",
"dims": 384,
"index": true,
"similarity": "cosine",
"index_options": {
"type": "int8_hnsw",
"m": 16,
"ef_construction": 100
}
}

Kathleen_DeRusso · April 18, 2025, 1:29pm

Actually, a colleague pointed out to me that 40 shards is likely a culprit. For that many vectors you probably need a lot of RAM (maybe 80 GB)? How much RAM do you have?

Also, if you need to save space on your existing deployment you could consider switching to int4_hnsw which will halve the RAM requirements.

Saleh_AbuAli · April 18, 2025, 1:58pm

These are the details for the nodes I have, with information for each of them. Please note that I have another index on the same nodes. Can you please let me know if this amount of RAM is sufficient, or if I need to increase it?

Node Name	Role	Total RAM (GB)	Used RAM (GB)	Used %	Heap Used %	JVM Heap (GB)
`es-data-2`	Data	66.57	60.12	90%	36%	34.36
`es-master-0`	Master	4.00	2.99	70%	45%	2.15
`es-master-1`	Master	4.00	2.85	66%	57%	2.15
`es-data-5`	Data	66.57	60.31	91%	70%	34.36
`es-data-3`	Data	66.57	60.37	91%	75%	34.36
`es-data-0`	Data	66.57	60.27	91%	43%	34.36
`es-master-2`	Master	4.00	2.79	65%	35%	2.15
`es-data-1`	Data	66.57	60.37	91%	57%	34.36
`es-data-4`	Data	66.57	60.36	91%	39%	34.36

Document Distribution :

Node Name	Total Documents
es-data-0	38,274,701
es-data-1	0
es-data-2	38,457,734
es-data-3	38,351,625
es-data-4	38,342,771
es-data-5	38,401,115

Kathleen_DeRusso · April 18, 2025, 2:29pm

Here's a good reference on sizing.

Remember that for HNSW, we have to store everything in memory, so the back of the envelope formula is num_vectors * (num_dimensions + 4) - that's without replicas and also by itself without other indices on the same nodes. So by itself, ballpark around 80 GB. Assuming that your data nodes here are only 1 replica, that might be sufficient on its own but it really depends on what else is there besides the vectors. The analyze disk usage API could also be helpful here.

Beyond that it would be interesting to know what type of filters you're sending in that result in timeouts.

Saleh_AbuAli · April 18, 2025, 2:48pm

Thanks @Kathleen_DeRusso I tried to run the disk_usage query, but I got a timeout error every time, and there's no way to control the timeout value.

Saleh_AbuAli · April 18, 2025, 3:00pm

Here are sample of the filters that I applied :

{"knn":[{"field":"@vector","query_vector”:[queryvector,]”k”:100,"num_candidates":200,
"filter"
:[{"range":{"publicationYear":{"gte":"2010","lte":"2022"}}}],
{"bool":
{"must_not":[{"terms":{"HostedIn.Venue.@id":["data1","data2"]}}],
{"should":[{"terms":{"HostedIn.Venue.Publish.Institution.@id":["data3","data4"]}},{"terms":{"HostedIn.Venue.@id":["data6"]}}]}}]}]
}```

Kathleen_DeRusso · April 18, 2025, 5:30pm

Nothing stands out to me as extremely expensive in the filters - if I had seen something like function_score in the filter I would say it needs to be optimized, but you're only doing terms and range filters. I suppose you could experiment and see if one of them is the root cause of the timeout (IDK, maybe caching or something at play there if your documents are frequently changing and these filters are hitting a ton of documents)..

You're not doing other expensive query operations like function score queries or aggs when you're timing out are you?

You could also look at how many segments you're searching and whether a force merge helps.

You could try either increasing memory/nodes or going down to int4 quantization if you feel memory might be an issue based on the above information.

Also, going back to the smaller k that would definitely be faster, especially if your application didn't have to do custom post-filtering.

Bob_Penrod · April 25, 2025, 5:34pm

Just a lurker on this thread, with a smaller data set — If RAM weren't so much a constraint, what sort of performance benefit might we expect from using dot_product instead of cosine similarity, provided we normalize our vectors?

Kathleen_DeRusso · April 28, 2025, 12:48pm

Newer versions of Elasticsearch (8.12+) will do that normalization for you under the hood! You can find out more in the PR.

Topic		Replies	Views
Managed Elastic search for billion scale dense vector index and performance Elasticsearch vector-search	3	1465	December 23, 2022
Help optimizing slow knn query Elasticsearch vector-search	7	230	November 4, 2024
ANN Search is super slow Elasticsearch vector-search	15	2240	November 22, 2023
KNN Search super slow Elasticsearch docker , vector-search	3	1349	January 17, 2023
Slow speed of ANN dense vector search using _knn_search Elasticsearch	8	2250	July 22, 2022

Performance Issue with KNN + Filter on Large Index (v8.12)

Document Distribution :

Related topics