Hi everyone,
I’m running an Elasticsearch cluster (version 8.17) with the basic license. The cluster has 3 nodes, each on Linux servers with 62 GB RAM and 1 TB disk space. I’ve set up an index for vector search using the following mapping and settings:
Mapping:
{
"mappings": {
"_source": {
"excludes": [
"vector"
]
},
"properties": {
"vector": {
"type": "dense_vector",
"index": true,
"index_options": {
"type": "int8_hnsw"
},
"dims": 256,
"similarity": "cosine"
}
}
},
"settings": {
"index": {
"refresh_interval": "60s",
"number_of_shards": "3",
"number_of_replicas": "2"
}
}
}
I’ve indexed 25 million vectors into this index. After that, I updated the settings as follows:
"merge": {
"policy": {
"max_merged_segment": "20g"
}
},
"store": {
"preload": [
"vex",
"veq"
]
}
When I perform a KNN search for the first time, the request latency is around 2 minutes. Subsequent requests for the same query are reduced to around 10 seconds.
I tested a similar setup (same cluster and data) with Qdrant, and it works fine with much lower latency for the initial queries.
Why is the initial KNN search so slow compared to subsequent searches?
Are there additional optimizations I can apply to improve the latency?