Speed of dense vector search with 512 or more dimensions

Hi Team,

Reading the article Introducing approximate nearest neighbor search in Elasticsearch 8.0 is very useful to our lab for building an Elasticsearch service, so I would like to consult you on how to speed up our query. I made two index mappings by score script with cosine similarity and by ANN algorithm to evaluate which is better for our task, then inserted 10,000,000 data separately. As a result of the article, ANN searching is faster than score script, but querying is a little slow. I share our evaluations as shown below:

index for script_score

# index for script_score
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "text": {
        "type": "text"
      },
      "text_vector": {
        "type": "dense_vector",
        "dims": 512
      },
      "src": {
        "type": "text"
      }
    }
  }
}

index for ANN

# index for ANN
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "text": {
        "type": "text"
      },
      "text_vector": {
        "type": "dense_vector",
        "dims": 512,
        "index": true,
        "similarity": "l2_norm"
      },
      "src": {
        "type": "text"
      }
    }
  }
}

Searching Time (seconds)

query index for script score index for ANN
1st 117.3676 191.6165
2 9.2250 0.1063
3 8.9369 0.1175
4 8.6687 0.1159

‚Äč‚ÄčI would be grateful if you could share how to improve dense vector searching speed with 512 or more dimensions‚Äč, particularly the first query that spent more time.

Darren Yang

1 Like

Thank you for reporting your use case.

I think that 1st query takes a lot of time is because it waits for the index to be refreshed. So if after all indexing is done, you run _refresh command for your index, and only after that run searches, your 1st query will be also very fast.

An extra way to speed up knn searches is force merge index to a single segment. But you should do that only on an index that will not get any more updates.

Thank you for your prompt reply and suggestion, it's helpful to our first experiment.

After using the _refresh command for indexes, it has improved obviously. I share my result as shown below:

Searching Time (seconds)

Note: 10,000,000 documents in each index.

query index for script score index for ANN
1st 9.327 0.169
2 9.387 0.172
3 9.233 0.246

Afterward, I tried to do another experiment with 100,000,000 documents in each index, and the result is as follows:

Searching Time (seconds)

Note: 100,000,000 documents in each index.

query index for script score index for ANN
1st 1124.940 1076.324
2 1092.787 0.598
3 1092.75 0.456

Could you give me some suggestions for the above situation?

Thank you again for everything you've shared.

Hello again @telunyang. My guess is that with your new experiment, the "refresh" call did not work (or perhaps you forgot to call "refresh" again before searching?) We can see this because the first query is very slow, but the next queries are quite fast.

Maybe you could double-check that the "refresh" call actually completed. You may need to set a higher request timeout, since sometimes Elasticsearch clients will time out the connection before an operation is complete.

1 Like