Vector search large dense vectors performance issues

Pablo_Delgado · July 15, 2025, 3:25pm

I experience inconsistent (slow and fast results) on vector search. I currently have 50 million documents in an index, the vectors are stored in a filed with the current mapping:

"large_1536_embedding": {
  "type": "dense_vector",
  "dims": 1536,
  "index": true,
  "similarity": "dot_product",
  "index_options": {
      "type": "bbq_hnsw",
      "m": 16,
      "ef_construction": 100
  }
}

i get very inconsistent results when querying, sometimes it takes less than a second and the same query sometimes takes more than a minute, I have tried multiple shards configurations and multiple machines (small to big in ram, cpu, etc), tried different version of elasticsearch (even 9.03) but the results are still inconsistent. can anyone give me a clue on how to understand what the problem might be?

the type of query i run contains no filters:

"knn": {
    "field": "large_1536_embedding",
    "k": 25,
    "num_candidates": 250,
    "query_vector": [0,1,2,etc]
}

carly.richmond · July 16, 2025, 9:31am

Hi @Pablo_Delgado,

Welcome! Can you share the the cluster configuration you are using (number of nodes and shards). I'm aware you've said you've tried multiple configurations but it would be useful to know.

Can you also share the number of segments, the disk usage stats and the output of the profile API. Those stats have given good pointers to people in the past, and given you're using 9.03 from your message all will be available to you.

Let us know!

Pablo_Delgado · July 16, 2025, 10:29am

i tried multiple shards configurations

shards tested: 1,4,8,16,24,48
tried different machine types: 4, 8, 22 cores
but tried always a single node

segments: i see many segments per shard and very uneven data distribution here, should i merge all in one?

{
    "_shards": {
        "total": 48,
        "successful": 24,
        "failed": 0
    },
    "indices": {
        "test_bbq_hnsw_24_shards": {
            "shards": {
                "0": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "pKKRmbY4QtieihRRKN9CtQ"
                        },
                        "num_committed_segments": 37,
                        "num_search_segments": 37,
                        
                    }
                ],
                "1": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "pKKRmbY4QtieihRRKN9CtQ"
                        },
                        "num_committed_segments": 15,
                        "num_search_segments": 15,
                        "segments": 
                    }
                ],
                "2": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "pKKRmbY4QtieihRRKN9CtQ"
                        },
                        "num_committed_segments": 30,
                        "num_search_segments": 30,
                        
                    }
                ],

unfortunately this forum doesn't allow me to copy the results because of the size nor share links

Disk stats

{
   "_shards":{
      "total":24,
      "successful":24,
      "failed":0
   },
   "test_bbq_hnsw_24_shards":{
      "store_size":"834.4gb",
      "store_size_in_bytes":895941598686,
      "all_fields":{
         "total":"833.8gb",
         "total_in_bytes":895359193132,
         "inverted_index":{
            "total":"53.6gb",
            "total_in_bytes":57554575162
         },
         "stored_fields":"531.6gb",
         "stored_fields_in_bytes":570838351832,
         "doc_values":"16gb",
         "doc_values_in_bytes":17282665200,
         "points":"422.2mb",
         "points_in_bytes":442724692,
         "norms":"718.2mb",
         "norms_in_bytes":753190311,
         "term_vectors":"0b",
         "term_vectors_in_bytes":0,
         "knn_vectors":"231.4gb",
         "knn_vectors_in_bytes":248487685935
      },
      "fields":{
         "_id":{
            "total":"609.7mb",
            "total_in_bytes":639358058,
            "inverted_index":{
               "total":"204mb",
               "total_in_bytes":213971963
            },
            "stored_fields":"405.6mb",
            "stored_fields_in_bytes":425386095,
            "doc_values":"0b",
            "doc_values_in_bytes":0,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "_ignored":{
            "total":"121.9mb",
            "total_in_bytes":127880979,
            "inverted_index":{
               "total":"35.4mb",
               "total_in_bytes":37139482
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"86.5mb",
            "doc_values_in_bytes":90741497,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "_primary_term":{
            "total":"0b",
            "total_in_bytes":0,
            "inverted_index":{
               "total":"0b",
               "total_in_bytes":0
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"0b",
            "doc_values_in_bytes":0,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "_seq_no":{
            "total":"143.4mb",
            "total_in_bytes":150412981,
            "inverted_index":{
               "total":"0b",
               "total_in_bytes":0
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"89.3mb",
            "doc_values_in_bytes":93680345,
            "points":"54.1mb",
            "points_in_bytes":56732636,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "_source":{
            "total":"531.2gb",
            "total_in_bytes":570412965737,
            "inverted_index":{
               "total":"0b",
               "total_in_bytes":0
            },
            "stored_fields":"531.2gb",
            "stored_fields_in_bytes":570412965737,
            "doc_values":"0b",
            "doc_values_in_bytes":0,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "_version":{
            "total":"0b",
            "total_in_bytes":0,
            "inverted_index":{
               "total":"0b",
               "total_in_bytes":0
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"0b",
            "doc_values_in_bytes":0,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "oai_large_1536_embedding":{
            "total":"231.4gb",
            "total_in_bytes":248487685935,
            "inverted_index":{
               "total":"0b",
               "total_in_bytes":0
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"0b",
            "doc_values_in_bytes":0,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"231.4gb",
            "knn_vectors_in_bytes":248487685935
         },
         
         "title":{
            "total":"1.3gb",
            "total_in_bytes":1496600783,
            "inverted_index":{
               "total":"1.3gb",
               "total_in_bytes":1457766907
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"0b",
            "doc_values_in_bytes":0,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"37mb",
            "norms_in_bytes":38833876,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "title.keyword":{
            "total":"5.5gb",
            "total_in_bytes":5909919989,
            "inverted_index":{
               "total":"2.9gb",
               "total_in_bytes":3165320239
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"2.5gb",
            "doc_values_in_bytes":2744599750,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         }
      }
   }
}

Pablo_Delgado · July 16, 2025, 1:42pm

I think I found the issue:
after the creation of the index and even when i can query it, is still processing the quantization, i noticed some small cpu consumption related to this:
ES818BinaryQuantizedVectorsWriter

since i had 3 servers with differnet specs running in paralell i noticed suddenly 2 of them the queries started working fast without me changing anything, but the smallest server seems to be processing the quantization, so if im right maybe after a few days when the process is finished i will also see the speed improve a lot.

i can close this i think since the other 2 servers now serve a super fast vector search

Topic		Replies	Views
Search Performance Elasticsearch	9	372	July 6, 2017
Slow Query Performance Elasticsearch	10	799	July 6, 2017
Question: How to gauge/improve performance Elasticsearch	17	1049	July 6, 2017
Improve query time Elasticsearch	16	2163	July 6, 2017
Improving search speed for 100 million queries Elasticsearch	8	2453	July 6, 2017

Vector search large dense vectors performance issues

Related topics