Speed of dense vector search with 512 or more dimensions

Hi Team,

Reading the article Introducing approximate nearest neighbor search in Elasticsearch 8.0 is very useful to our lab for building an Elasticsearch service, so I would like to consult you on how to speed up our query. I made two index mappings by score script with cosine similarity and by ANN algorithm to evaluate which is better for our task, then inserted 10,000,000 data separately. As a result of the article, ANN searching is faster than score script, but querying is a little slow. I share our evaluations as shown below:

index for script_score

# index for script_score
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "text": {
        "type": "text"
      },
      "text_vector": {
        "type": "dense_vector",
        "dims": 512
      },
      "src": {
        "type": "text"
      }
    }
  }
}

index for ANN

# index for ANN
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "text": {
        "type": "text"
      },
      "text_vector": {
        "type": "dense_vector",
        "dims": 512,
        "index": true,
        "similarity": "l2_norm"
      },
      "src": {
        "type": "text"
      }
    }
  }
}

Searching Time (seconds)

query index for script score index for ANN
1st 117.3676 191.6165
2 9.2250 0.1063
3 8.9369 0.1175
4 8.6687 0.1159

​​I would be grateful if you could share how to improve dense vector searching speed with 512 or more dimensions​, particularly the first query that spent more time.

Darren Yang

1 Like

Thank you for reporting your use case.

I think that 1st query takes a lot of time is because it waits for the index to be refreshed. So if after all indexing is done, you run _refresh command for your index, and only after that run searches, your 1st query will be also very fast.

An extra way to speed up knn searches is force merge index to a single segment. But you should do that only on an index that will not get any more updates.

1 Like

Thank you for your prompt reply and suggestion, it's helpful to our first experiment.

After using the _refresh command for indexes, it has improved obviously. I share my result as shown below:

Searching Time (seconds)

Note: 10,000,000 documents in each index.

query index for script score index for ANN
1st 9.327 0.169
2 9.387 0.172
3 9.233 0.246

Afterward, I tried to do another experiment with 100,000,000 documents in each index, and the result is as follows:

Searching Time (seconds)

Note: 100,000,000 documents in each index.

query index for script score index for ANN
1st 1124.940 1076.324
2 1092.787 0.598
3 1092.75 0.456

Could you give me some suggestions for the above situation?

Thank you again for everything you've shared.

Hello again @telunyang. My guess is that with your new experiment, the "refresh" call did not work (or perhaps you forgot to call "refresh" again before searching?) We can see this because the first query is very slow, but the next queries are quite fast.

Maybe you could double-check that the "refresh" call actually completed. You may need to set a higher request timeout, since sometimes Elasticsearch clients will time out the connection before an operation is complete.

2 Likes

Hi @Julie_Tibshirani, It's my pleasure to see you. I ran _refresh command after indexing and set a higher request timeout. I'd like to share some codes as shown below:

from elasticsearch import Elasticsearch
from sentence_transformers import SentenceTransformer
from dotenv import load_dotenv
load_dotenv()

# In this case, we use Sentence BERT model to make vectors.
model = SentenceTransformer('distiluse-base-multilingual-cased-v2')

# Connection
client = Elasticsearch(
    "https://localhost:9200",
    ca_certs=os.getenv("ELASTIC_CACERT"),
    basic_auth=( "elastic", os.getenv("ELASTIC_PASSWORD") ),
    request_timeout=9999, 
    max_retries=99999, 
    retry_on_timeout=True
)

# The name of index
index_name = 'ann_10000000' # or 'script_10000000'

# Query string or keyword
query = "Hello Elasticsearch!"

try:
    # Make a vector
    query_vector = model.encode(query)

    # Query
    if index_name.startswith('ann'): # Suppose the indexing name starts with ann 
        # Syntax of querying
        query = {
            "field": "text_vector",
            "query_vector": query_vector,
            "k": 10,
            "num_candidates": 100
        }

        # Response
        response = client.options( api_key=(os.getenv("ID"), os.getenv("API_KEY")) ).knn_search(
            index = index_name,
            knn = query,
            _source = {"includes": ["id", "text", "src"]},
            request_timeout = 6000
        )

    elif index_name.startswith('script'): # Suppose the indexing name starts with script 
        # Syntax of querying
        query = {
            "script_score": {
                "query": {"match_all": {}},
                "script": {
                    "params": {"query_vector": query_vector},
                    "source": "doc['text_vector'].size() == 0 ? 0 : cosineSimilarity(params.query_vector, 'text_vector') + 1.0"
                }
            }
        }

        # Response
        response = client.options( api_key=(os.getenv("ID"), os.getenv("API_KEY")) ).search(
            index = index_name,
            size = 10,
            query = query,
            _source = {"includes": ["id", "text", "src"]},
            request_timeout = 6000
        )

    # Show us the results
    print(response)

except: 
    logging.info("Unexpected error: ", sys.exc_info())

Does any part of the code need to be corrected for speeding up?

I hope I can make contributions via continuous discussion.

Thanks to the distinguished team members.

@telunyang I'm sorry for the slow reply! I accidentally missed your update.

You sent the code for searching, but it'd be most helpful to send the code you use for indexing the vectors and calling "refresh". This "refresh" call is what I suspect is not working. That means that your first search needs to do the refresh itself, which makes it very slow.

To be confident that the refresh completed, you could look at the response object. It contains a field called successful that shows the number of shards that were successfully refreshed.

Hi @Julie_Tibshirani, I've modified my code as shown below:

from elasticsearch import Elasticsearch, client as cl
from sentence_transformers import SentenceTransformer
from dotenv import load_dotenv
load_dotenv()

# In this case, we use Sentence BERT model to make vectors.
model = SentenceTransformer('distiluse-base-multilingual-cased-v2')

# Connection
client = Elasticsearch(
    "https://localhost:9200",
    ca_certs=os.getenv("ELASTIC_CACERT"),
    basic_auth=( "elastic", os.getenv("ELASTIC_PASSWORD") ),
    request_timeout=9999, 
    max_retries=99999, 
    retry_on_timeout=True
)

# Refresh
pprint(cl.IndicesClient(client).refresh())

# The name of index
index_name = 'ann_10000000' # or 'script_10000000'

# Query string or keyword
query = "Hello Elasticsearch!"

try:
    # Make a vector
    query_vector = model.encode(query)

    # Query
    if index_name.startswith('ann'): # Suppose the indexing name starts with ann 
        # Syntax of querying
        query = {
            "field": "text_vector",
            "query_vector": query_vector,
            "k": 10,
            "num_candidates": 100
        }

        # Response
        response = client.options( api_key=(os.getenv("ID"), os.getenv("API_KEY")) ).knn_search(
            index = index_name,
            knn = query,
            _source = {"includes": ["id", "text", "src"]},
            request_timeout = 6000
        )

    elif index_name.startswith('script'): # Suppose the indexing name starts with script 
        # Syntax of querying
        query = {
            "script_score": {
                "query": {"match_all": {}},
                "script": {
                    "params": {"query_vector": query_vector},
                    "source": "doc['text_vector'].size() == 0 ? 0 : cosineSimilarity(params.query_vector, 'text_vector') + 1.0"
                }
            }
        }

        # Response
        response = client.options( api_key=(os.getenv("ID"), os.getenv("API_KEY")) ).search(
            index = index_name,
            size = 10,
            query = query,
            _source = {"includes": ["id", "text", "src"]},
            request_timeout = 6000
        )

    # Show us the results
    print(response)

except: 
    logging.info("Unexpected error: ", sys.exc_info())

Differences:

  1. Additional package for IndicesClient (client as cl)
from elasticsearch import Elasticsearch
to
from elasticsearch import Elasticsearch, client as cl
  1. Add refresh and output results:
# Refresh
pprint(cl.IndicesClient(client).refresh())

Outputs:
ObjectApiResponse({'_shards': {'total': 20, 'successful': 10, 'failed': 0}})

Nevertheless, the performance doesn't improve. Does It seem to need to make indices in clustering architecture for speeding up? The settings of number_of_shards: 1, number_of_replicas: 1 would not satisfy my requirements.

Thanks, it indeed looks like the refresh completed successfully. Are you still seeing the behavior you shared above where the first ANN search is extremely slow, but subsequent searches are faster? And are you okay with the performance of the 2nd and 3rd searches? Or is the first ANN search a bit now when you rerun the benchmarks?

If it's still extremely slow (~1000s), then this is very surprising. To help us debug, you could capture a few threads to hot_threads during the call (Nodes hot threads API | Elasticsearch Guide [8.3] | Elastic). This will tell us what Elasticsearch is doing during that slow search.

As for your question about sharding -- yes, generally increasing the number of shards can help with search latency. 100 million is also a pretty large number of vectors, and can require substantial memory. We recommend having enough RAM to fit all of the vector data in memory. So it could help to use more than one machine, and distribute a few shards across them.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.