Hi there!
I'm very excited to explore Elasticsearch as an option for both ANN and hybrid search solution. The current guides provided for tuning KNN performance have been very helpful however I feel like I'm encountering some unexpected behaviour in regards to preloading and we're also not really seeing the numbers that would resemble the benchmarks.
Here's the setup at the moment and the numbers we're seeing:
3 master nodes
20 data nodes with:
- 29 vCPUs; Intel Xeon Ice Lake with 2.4Ghz up to 3.5Ghz boost or something. We're using Google Cloud's N2s.
- 110GiB of memory
- 30 GB of heap (-Xms30g -Xmx30g)
- 500GiB of disk space
The index we've set up contains 350M~ docs+ that contain a 384 dim dense vector which is mapped as this:
"text_embedding": {
"type": "dense_vector",
"dims": 384,
"index": true,
"similarity": "dot_product"
}
According to disk space usage analyzer we're seeing:
"knn_vectors"=>"559.1gb"
So supposedly we'd need at least 8 nodes with 70GB of RAM available for page caching in order to hold this in memory. Resource allocation shouldn't be an issue.
The KNN benchmarks are run on a single shard with 2M vectors @ 768 dimensions, so given that our vectors are half the size we assumed we could have 4M vectors per shard to have a somewhat equivalent setup.
Our current index settings are:
"settings": {
"index": {
"number_of_shards": "100",
"number_of_replicas": "1",
"refresh_interval": "-1",
"store.preload": ["vec", "vex", "vem", "vemf", "veq", "vemq"]
}
}
This gives us decent results once we refresh and force-merge to single segment:
P50 - 50ms~
P95 - 77ms~
P99 - 100ms~
(This is on the client side, hence ES + network)
Since we're using 8.10, we've also enabled this setting:
"search": {
"query_phase_parallel_collection_enabled": "true"
}
Our total index size is 1.8tb + one replica.
According to this our shards and force-merged segments are around 18GB.
The ANN query we're doing also has source: false
and only uses fields relying inverted indices for filtering.
- Given this information, is there anything else we could tune here?
- Force-merging vs force-merging to a single segment has similar performance.
- In order to have the mentioned performance we need to warm-up the cluster quite a bit and we see a lot of I/O meanwhile; preloading doesn't seem to help here. Anything we could do here?
- Once the cluster is warmed up the I/O is almost zero during load tests; I'm assuming this is a good indicator that we have everything we need in memory?