Hi Team,
Reading the article Introducing approximate nearest neighbor search in Elasticsearch 8.0 is very useful to our lab for building an Elasticsearch service, so I would like to consult you on how to speed up our query. I made two index mappings by score script with cosine similarity and by ANN algorithm to evaluate which is better for our task, then inserted 10,000,000 data separately. As a result of the article, ANN searching is faster than score script, but querying is a little slow. I share our evaluations as shown below:
index for script_score
# index for script_score
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"id": {
"type": "text"
},
"text": {
"type": "text"
},
"text_vector": {
"type": "dense_vector",
"dims": 512
},
"src": {
"type": "text"
}
}
}
}
index for ANN
# index for ANN
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"id": {
"type": "text"
},
"text": {
"type": "text"
},
"text_vector": {
"type": "dense_vector",
"dims": 512,
"index": true,
"similarity": "l2_norm"
},
"src": {
"type": "text"
}
}
}
}
Searching Time (seconds)
query | index for script score | index for ANN |
---|---|---|
1st | 117.3676 | 191.6165 |
2 | 9.2250 | 0.1063 |
3 | 8.9369 | 0.1175 |
4 | 8.6687 | 0.1159 |
I would be grateful if you could share how to improve dense vector searching speed with 512 or more dimensions, particularly the first query that spent more time.
Darren Yang