Script score vector search performance

Hello!

I use Elasticsearch 7.14 and I have a mapping like this:

{
  "mappings": {
    "properties": {
      "vector": {
        "type": "dense_vector",
        "dims": 512
      },
      "category": {
        "type": "keyword"
      },
      "name": {
        "type": "keyword"
      },
      "source": {
        "type": "keyword"
      }
    }
  }
}

This index is primarily used to search by vector using cosine similarity, like this:

{
  script_score: {
    query: { match_all: {} },
    script: {
      source: '(1.0 + cosineSimilarity(params.query_vector, \'vector\'))',
      params: {
        query_vector: vectorArray
      }
    },
    min_score: 0
  }
}

I have around 600k documents in this index, and for this amount of documents I think sharding is not necessary. However, even when I have 3 shards, I have total search time of 5 seconds, according to Kibana devtools.

Is there something else I could do to improve search speed?

You are running query with a matchg all clause, so all documents will need to be scored using the script. Each query is run in a single thread against each shard so in order to increase parallelism and use more CPU cores you need to increase the number of primary shards.

If the index is small, try reindexing it into a few new indices with varying number of primary shards and see how querying how the different indices compare.

1 Like

Yes, you can try to introduce a more restrictive filter instead of match_all query.

Also, from 8.0 you can try approximate knn search which is much faster that exact knn search.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.