Vector search large dense vectors performance issues

I experience inconsistent (slow and fast results) on vector search. I currently have 50 million documents in an index, the vectors are stored in a filed with the current mapping:

"large_1536_embedding": {
  "type": "dense_vector",
  "dims": 1536,
  "index": true,
  "similarity": "dot_product",
  "index_options": {
      "type": "bbq_hnsw",
      "m": 16,
      "ef_construction": 100
  }
}

i get very inconsistent results when querying, sometimes it takes less than a second and the same query sometimes takes more than a minute, I have tried multiple shards configurations and multiple machines (small to big in ram, cpu, etc), tried different version of elasticsearch (even 9.03) but the results are still inconsistent. can anyone give me a clue on how to understand what the problem might be?

the type of query i run contains no filters:

"knn": {
    "field": "large_1536_embedding",
    "k": 25,
    "num_candidates": 250,
    "query_vector": [0,1,2,etc]
}

Hi @Pablo_Delgado,

Welcome! Can you share the the cluster configuration you are using (number of nodes and shards). I'm aware you've said you've tried multiple configurations but it would be useful to know.

Can you also share the number of segments, the disk usage stats and the output of the profile API. Those stats have given good pointers to people in the past, and given you're using 9.03 from your message all will be available to you.

Let us know!

i tried multiple shards configurations

shards tested: 1,4,8,16,24,48
tried different machine types: 4, 8, 22 cores
but tried always a single node

segments: i see many segments per shard and very uneven data distribution here, should i merge all in one?

{
    "_shards": {
        "total": 48,
        "successful": 24,
        "failed": 0
    },
    "indices": {
        "test_bbq_hnsw_24_shards": {
            "shards": {
                "0": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "pKKRmbY4QtieihRRKN9CtQ"
                        },
                        "num_committed_segments": 37,
                        "num_search_segments": 37,
                        
                    }
                ],
                "1": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "pKKRmbY4QtieihRRKN9CtQ"
                        },
                        "num_committed_segments": 15,
                        "num_search_segments": 15,
                        "segments": 
                    }
                ],
                "2": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "pKKRmbY4QtieihRRKN9CtQ"
                        },
                        "num_committed_segments": 30,
                        "num_search_segments": 30,
                        
                    }
                ],

unfortunately this forum doesn't allow me to copy the results because of the size nor share links

Disk stats

{
   "_shards":{
      "total":24,
      "successful":24,
      "failed":0
   },
   "test_bbq_hnsw_24_shards":{
      "store_size":"834.4gb",
      "store_size_in_bytes":895941598686,
      "all_fields":{
         "total":"833.8gb",
         "total_in_bytes":895359193132,
         "inverted_index":{
            "total":"53.6gb",
            "total_in_bytes":57554575162
         },
         "stored_fields":"531.6gb",
         "stored_fields_in_bytes":570838351832,
         "doc_values":"16gb",
         "doc_values_in_bytes":17282665200,
         "points":"422.2mb",
         "points_in_bytes":442724692,
         "norms":"718.2mb",
         "norms_in_bytes":753190311,
         "term_vectors":"0b",
         "term_vectors_in_bytes":0,
         "knn_vectors":"231.4gb",
         "knn_vectors_in_bytes":248487685935
      },
      "fields":{
         "_id":{
            "total":"609.7mb",
            "total_in_bytes":639358058,
            "inverted_index":{
               "total":"204mb",
               "total_in_bytes":213971963
            },
            "stored_fields":"405.6mb",
            "stored_fields_in_bytes":425386095,
            "doc_values":"0b",
            "doc_values_in_bytes":0,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "_ignored":{
            "total":"121.9mb",
            "total_in_bytes":127880979,
            "inverted_index":{
               "total":"35.4mb",
               "total_in_bytes":37139482
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"86.5mb",
            "doc_values_in_bytes":90741497,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "_primary_term":{
            "total":"0b",
            "total_in_bytes":0,
            "inverted_index":{
               "total":"0b",
               "total_in_bytes":0
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"0b",
            "doc_values_in_bytes":0,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "_seq_no":{
            "total":"143.4mb",
            "total_in_bytes":150412981,
            "inverted_index":{
               "total":"0b",
               "total_in_bytes":0
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"89.3mb",
            "doc_values_in_bytes":93680345,
            "points":"54.1mb",
            "points_in_bytes":56732636,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "_source":{
            "total":"531.2gb",
            "total_in_bytes":570412965737,
            "inverted_index":{
               "total":"0b",
               "total_in_bytes":0
            },
            "stored_fields":"531.2gb",
            "stored_fields_in_bytes":570412965737,
            "doc_values":"0b",
            "doc_values_in_bytes":0,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "_version":{
            "total":"0b",
            "total_in_bytes":0,
            "inverted_index":{
               "total":"0b",
               "total_in_bytes":0
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"0b",
            "doc_values_in_bytes":0,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "oai_large_1536_embedding":{
            "total":"231.4gb",
            "total_in_bytes":248487685935,
            "inverted_index":{
               "total":"0b",
               "total_in_bytes":0
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"0b",
            "doc_values_in_bytes":0,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"231.4gb",
            "knn_vectors_in_bytes":248487685935
         },
         
         "title":{
            "total":"1.3gb",
            "total_in_bytes":1496600783,
            "inverted_index":{
               "total":"1.3gb",
               "total_in_bytes":1457766907
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"0b",
            "doc_values_in_bytes":0,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"37mb",
            "norms_in_bytes":38833876,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         },
         "title.keyword":{
            "total":"5.5gb",
            "total_in_bytes":5909919989,
            "inverted_index":{
               "total":"2.9gb",
               "total_in_bytes":3165320239
            },
            "stored_fields":"0b",
            "stored_fields_in_bytes":0,
            "doc_values":"2.5gb",
            "doc_values_in_bytes":2744599750,
            "points":"0b",
            "points_in_bytes":0,
            "norms":"0b",
            "norms_in_bytes":0,
            "term_vectors":"0b",
            "term_vectors_in_bytes":0,
            "knn_vectors":"0b",
            "knn_vectors_in_bytes":0
         }
      }
   }
}

I think I found the issue:
after the creation of the index and even when i can query it, is still processing the quantization, i noticed some small cpu consumption related to this:
ES818BinaryQuantizedVectorsWriter

since i had 3 servers with differnet specs running in paralell i noticed suddenly 2 of them the queries started working fast without me changing anything, but the smallest server seems to be processing the quantization, so if im right maybe after a few days when the process is finished i will also see the speed improve a lot.

i can close this i think since the other 2 servers now serve a super fast vector search