Slow aKNN search

We have implemented vector similarity search using ES dense_vector field and KNN option in the search API.

We are using 1024 dimension embeddings and our index size is about 60 Gb for approx 11_000_000 documents. So our primary_store_size is 60.3 GB and store_size is 179,3 GB

During testing we found out that searches are taking quite a lot if time, 20-30 seconds when we set num_candidates to something like 400-500.

Our original thinking was that the whole index was not fitting into memory (the server has 128 Gb RAM), so we built a test server with 128 RAM and cloned the drive. So now both primary_store_size and store_size are 60.3 Gb and the index should definitely fit the RAM (there is nothing else running on this server). But we did not really see any improvements in the search time, sometimes it actually takes more time.

How can I troubleshoot this problem? Why did not we see any improvement in search speed even though the whole index fits into RAM (which by itself should be a major boost to search performance).

PS: I did read the article here and implemented some of the suggestions (dot_product, enough RAM, exclude vector fields from _source, avoid heavy indexing during searches, avoid page cache thrashing by using modest readahead values on Linux)

Although I do not have experience with dense_vector fields, I know this is an area where a lot of improvements have been made lately. It is always recommended that you specify which version of Elasticsearch you are using as that can matter greatly.

Ah, yes, we have "number": "8.4.1",

We try to keep up with the latest version of ES specifically because it keeps improving in this area of searching.

@ruslaniv to see the size of the vector, use Analyze index disk usage API | Elasticsearch Guide [master] | Elastic. That will show the usage of the vector index.

Keep in mind that the JVM will take some RAM, and the vector index requires ram that is NOT on the JVM heap.

How many segments are you searching? It seems like you have 11 million docs on the same server?

To see the number of segments: Index segments API | Elasticsearch Guide [8.6] | Elastic.

1 Like

Ben, thank you for your answer!
Here is our configuration:

  1. Production
"_nodes": {
	"total": 3,
	"successful": 3,
	"failed": 0
    },

each node has 32Gb RAM
This is our dense_vector field as stored in ES:

"title_vector": {
	"total": "46.7gb",
	"total_in_bytes": 50237080296,
	"inverted_index": {
		"total": "0b",
		"total_in_bytes": 0
	},
	"stored_fields": "0b",
	"stored_fields_in_bytes": 0,
	"doc_values": "0b",
	"doc_values_in_bytes": 0,
	"points": "0b",
	"points_in_bytes": 0,
	"norms": "0b",
	"norms_in_bytes": 0,
	"term_vectors": "0b",
	"term_vectors_in_bytes": 0,
	"knn_vectors": "46.7gb",
	"knn_vectors_in_bytes": 50237080296

and here is the segments information:

"proposals.proposals.vector_20221216": {
	"shards": {
		"0": [
				{
				"routing": {
					"state": "STARTED",
					"primary": true,
					"node": "3fQTuJNpQOeCg9k2zQI6Rg"
				},
				"num_committed_segments": 49,
				"num_search_segments": 49,
				"segments": {...}
				},
				"routing": {
					"state": "STARTED",
					"primary": false,
					"node": "23uocumgThOiaraPXP_JNA"
				},
				"num_committed_segments": 48,
				"num_search_segments": 48,
				"segments": {...},
				"routing": {
					"state": "STARTED",
					"primary": false,
					"node": "vGiOGPHoQnSNmndhy2Np1A"
				},
				"num_committed_segments": 49,
				"num_search_segments": 49,
				"segments": {...}
			]
	}
}

So my initial thinking was that the biggest culprit was that the index did not fit into RAM, that's why I set up a test server where I just cloned a drive from the primary production ES node.

  1. Test server
"_nodes": {
	"total": 1,
	"successful": 1,
	"failed": 0
    },

The test server has 128Gb RAM
The dense_vector field is the same as on the production server:

"title_vector": {
	"total": "46.7gb",

and the segments:

"proposals.proposals.vector_20221216": {
	"shards": {
		"0": [
			{
				"routing": {
					"state": "STARTED",
					"primary": true,
					"node": "VyKbexZxTP24nHpcsNtS6w"
				},
				"num_committed_segments": 49,
				"num_search_segments": 49,
				"segments": {...}
			}
		]
	}
}

With all that, the production server is still somewhat faster searching using kNN than test server although the test server has a lot more RAM. So I'm not sure fitting the whole index into RAM resulted in performance increase.