Slow aKNN search

We have implemented vector similarity search using ES dense_vector field and KNN option in the search API.

We are using 1024 dimension embeddings and our index size is about 60 Gb for approx 11_000_000 documents. So our primary_store_size is 60.3 GB and store_size is 179,3 GB

During testing we found out that searches are taking quite a lot if time, 20-30 seconds when we set num_candidates to something like 400-500.

Our original thinking was that the whole index was not fitting into memory (the server has 128 Gb RAM), so we built a test server with 128 RAM and cloned the drive. So now both primary_store_size and store_size are 60.3 Gb and the index should definitely fit the RAM (there is nothing else running on this server). But we did not really see any improvements in the search time, sometimes it actually takes more time.

How can I troubleshoot this problem? Why did not we see any improvement in search speed even though the whole index fits into RAM (which by itself should be a major boost to search performance).

PS: I did read the article here and implemented some of the suggestions (dot_product, enough RAM, exclude vector fields from _source, avoid heavy indexing during searches, avoid page cache thrashing by using modest readahead values on Linux)

Although I do not have experience with dense_vector fields, I know this is an area where a lot of improvements have been made lately. It is always recommended that you specify which version of Elasticsearch you are using as that can matter greatly.

Ah, yes, we have "number": "8.4.1",

We try to keep up with the latest version of ES specifically because it keeps improving in this area of searching.

@ruslaniv to see the size of the vector, use Analyze index disk usage API | Elasticsearch Guide [master] | Elastic. That will show the usage of the vector index.

Keep in mind that the JVM will take some RAM, and the vector index requires ram that is NOT on the JVM heap.

How many segments are you searching? It seems like you have 11 million docs on the same server?

To see the number of segments: Index segments API | Elasticsearch Guide [8.6] | Elastic.

1 Like

Ben, thank you for your answer!
Here is our configuration:

  1. Production
"_nodes": {
	"total": 3,
	"successful": 3,
	"failed": 0
    },

each node has 32Gb RAM
This is our dense_vector field as stored in ES:

"title_vector": {
	"total": "46.7gb",
	"total_in_bytes": 50237080296,
	"inverted_index": {
		"total": "0b",
		"total_in_bytes": 0
	},
	"stored_fields": "0b",
	"stored_fields_in_bytes": 0,
	"doc_values": "0b",
	"doc_values_in_bytes": 0,
	"points": "0b",
	"points_in_bytes": 0,
	"norms": "0b",
	"norms_in_bytes": 0,
	"term_vectors": "0b",
	"term_vectors_in_bytes": 0,
	"knn_vectors": "46.7gb",
	"knn_vectors_in_bytes": 50237080296

and here is the segments information:

"proposals.proposals.vector_20221216": {
	"shards": {
		"0": [
				{
				"routing": {
					"state": "STARTED",
					"primary": true,
					"node": "3fQTuJNpQOeCg9k2zQI6Rg"
				},
				"num_committed_segments": 49,
				"num_search_segments": 49,
				"segments": {...}
				},
				"routing": {
					"state": "STARTED",
					"primary": false,
					"node": "23uocumgThOiaraPXP_JNA"
				},
				"num_committed_segments": 48,
				"num_search_segments": 48,
				"segments": {...},
				"routing": {
					"state": "STARTED",
					"primary": false,
					"node": "vGiOGPHoQnSNmndhy2Np1A"
				},
				"num_committed_segments": 49,
				"num_search_segments": 49,
				"segments": {...}
			]
	}
}

So my initial thinking was that the biggest culprit was that the index did not fit into RAM, that's why I set up a test server where I just cloned a drive from the primary production ES node.

  1. Test server
"_nodes": {
	"total": 1,
	"successful": 1,
	"failed": 0
    },

The test server has 128Gb RAM
The dense_vector field is the same as on the production server:

"title_vector": {
	"total": "46.7gb",

and the segments:

"proposals.proposals.vector_20221216": {
	"shards": {
		"0": [
			{
				"routing": {
					"state": "STARTED",
					"primary": true,
					"node": "VyKbexZxTP24nHpcsNtS6w"
				},
				"num_committed_segments": 49,
				"num_search_segments": 49,
				"segments": {...}
			}
		]
	}
}

With all that, the production server is still somewhat faster searching using kNN than test server although the test server has a lot more RAM. So I'm not sure fitting the whole index into RAM resulted in performance increase.

Answering here, since I think it is more relevant in this thread instead of this one https://discuss.elastic.co/t/profiling-knn-search/327065/5

Just to quote Ben's answer in the other thread:

KNN spends most of its time in the rewrite. So, that is indeed the hot spot. Its where the KNN search occurs.
We do segment searches serially. So, comparing your two open KNN search tickets this is what I think is happening.
You are on a single node with a single shard. That single shard has 49 segments, each seems to be an OK size (at least a GB or so).
But, this then means, on a single node, you are exploring 49 different HNSW graphs.
In the future, we want to make KNN work in parallel on the same shard but with different segments, but right now, that doesn't happen.
I think you should try force-merging your test node to fewer segments. It doesn't have to be 1. 1 would be best, but it could take a while to complete.

I have finally managed to run a force merge on the test server (copy of production ES server, 11 million documents, 49 segments) and merged the index to 1 segment. It did take about 3 days to run (i ran in asynchronously) but the results were astounding!

In average the improvements in search speed were close to an order of magnitude - from about 30 seconds to 2-3 seconds.
So yes, it would be terrific if ES could parallelize segment search.

Do you think it makes sense to force merge the production index to 1 segment?
We can use our old full text TF-IDF search index that we also keep just in case for three days during the weeked when the load is not high and then once our dense vector index is merged we can switch back to it.
But will the number of segments eventually grow when new documents are indexed or old - deleted?

@ruslaniv thank you for copying my answer here. I got all sorts of mixed up between different threads and forgot to answer similarly here.

Good news is that we are actively working on parallelizing KNN across multiple segments. I don't have a timeline, but we are aware and its something we want to do as soon as possible.

It did take about 3 days to run (i ran in asynchronously) but the results were astounding!

There are recent improvements on iterative merging speed. Also, we want to improve the "merge policy" (the underly logic on when segments get merged, to what size, how often, etc.) when it comes to vector search use-cases. This and inter-segment parallelism may obviate the need to force merge to a single segment.

Do you think it makes sense to force merge the production index to 1 segment?

It doesn't have to be a single segment. Having a single large segment can be costly (depending on size) and especially if actively indexing. The key thing is that there are not many small segments (< 1gb). So, merging to 10 could be good enough.

But will the number of segments eventually grow when new documents are indexed or old - deleted?

Yes, Lucene is "read only". Meaning, new documents (or updates) are written to NEW segments. In the case of updates/deletes, the older documents are flagged as deleted and ignored on search. Those deleted documents are then removed when a merge occurs for that segment.

So, as new data comes in, new segments are created. We are additionally looking into how we can update the merge policies to be better optimized for KNN search. This calculus may change as we add inter-segment parallelism.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.