More search time

Hi,
We have case where we have indexed 1billion vectors(each is 512 -dims) using dense_vector (elastic version 7.6).
We are using cosine similarity to find the best hits.
But every query is taking 6.7 sec to get the hits after every query. Which is quite annoying to the user. We want to use reduce the timing.
We installed the ES 7.6 on m5.4xlarge.elasticsearch(16 vCPU and 64GiB)

I have following questions.

  1. Possible options to improve it.
  2. If we use ANN or K-means and search only particular Centroid we might improve the timing. But we might loose some relevant hits. What are the way to improve this approach?

Thanks for you time.

It sound like you have used the explain API to identify that fetching the results is the slowest part, is this correct? If so, how much data does each node hold and what size and type of storage are you using?

No of Node =1
Instance type=m5a.4xlarge(AWS EC2)
Storage : HVM, 64-bit, SSD-Backed
"title_vector": {
"type": "dense_vector",
"dims": 512
}
Around 1 billion vectors were indexed. And we using blow piece of code to query:
{
"script_score": {
"query": {"match_all": {}},
"script": {
"source": "cosineSimilarity(params.query_vector, 'title_vector') + 1.0",
"params": {"query_vector": query_vector}
}
}
}

Every query is taking ~ 7 sec for the response.

We would like to know how can this timing improved.

Thanks for your time

Can you show the output from running this query through the explain API? Are you using any other conditions to narrow down the result or will all documents need to be scored?

How large is the index? How many shards does it have?

How large is the index?
We have indexed 8 million vectors as of now. Every vector is 512 dims. We plan to index at-least 30 million in future.

How many shards does it have?
shards =1

Are you using any other conditions to narrow down the result or will all documents need to be scored?
No.I am using below script to query.
script_query = {
"script_score": {
"query": {"match_all": {}},
"script": {
"source": "cosineSimilarity(params.query_vector, doc['title_vector']) + 1.0",
"params": {"query_vector": query_vector}
}
}
}

Thanks

Also:
{
"settings": {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
},

  "analysis": {
  "analyzer": {
    "my_analyzer": {
      "tokenizer": "standard",
      "filter": ["lowercase","snowball","stop_words_filter"]
    }
  },
  "filter": {
    "stop_words_filter": {
      "type": "stop",
      "ignore_case": true,
      "stopwords" : ["a", "about", "above", "after", "again", "against", "ain", "all", "am", "an", "and", "any", "are", "aren", "aren't", "as", "at", "be", "because", "been", "before", "being", "below", "between", "both", "but", "by", "can", "couldn", "couldn't", "d", "did", "didn", "didn't", "do", "does", "doesn", "doesn't", "doing", "don", "don't", "down", "during", "each", "few", "for", "from", "further", "had", "hadn", "hadn't", "has", "hasn", "hasn't", "have", "haven", "haven't", "having", "he", "her", "here", "hers", "herself", "him", "himself", "his", "how", "i", "if", "in", "into", "is", "isn", "isn't", "it", "it's", "its", "itself", "just", "ll", "m", "ma", "me", "mightn", "mightn't", "more", "most", "mustn", "mustn't", "my", "myself", "needn", "needn't", "no", "nor", "not", "now", "o", "of", "off", "on", "once", "only", "or", "other", "our", "ours", "ourselves", "out", "over", "own", "re", "s", "same", "shan", "shan't", "she", "she's", "should", "should've", "shouldn", "shouldn't", "so", "some", "such", "t", "than", "that", "that'll", "the", "their", "theirs", "them", "themselves", "then", "there", "these", "they", "this", "those", "through", "to", "too", "under", "until", "up", "ve", "very", "was", "wasn", "wasn't", "we", "were", "weren", "weren't", "what", "when", "where", "which", "while", "who", "whom", "why", "will", "with", "won", "won't", "wouldn", "wouldn't", "y", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", "yourselves", "could", "he'd", "he'll", "he's", "here's", "how's", "i'd", "i'll", "i'm", "i've", "let's", "ought", "she'd", "she'll", "that's", "there's", "they'd", "they'll", "they're", "they've", "we'd", "we'll", "we're", "we've", "what's", "when's", "where's", "who's", "why's", "would"]
    }
  }
}

},
"mappings": {
"dynamic": "true",
"_source": {
"enabled": "true"
},
"properties": {
"id": {
"type": "keyword"
},

  "sentence": {
    "type": "text"
  },
    
   "paperId": {
    "type": "text"
  },
    "title_vector": {
    "type": "dense_vector",
    "dims": 512
  }

}

}
}

This is the index config file

Given that you are doing a lot of processing using script you may get better performance with a larger number of primary shards as more work can be done in parallel. Use the split index api to create a new index with e.g. 16 primary shards and see if this performs better.

Thank you for the recommendation. We will try that.

What about using ann or KNN approach? will these approaches degrade accuracy(i mean relevance)?

Is Elastic search provides any of the below approaches to improve the timing behavior?

(https://issues.apache.org/jira/browse/LUCENE-9136)

  1. Tree-base algorithms, such as KD-tree;
  2. Hashing methods, such as LSH (Local Sensitive Hashing);
  3. Product quantization based algorithms, such as IVFFlat;
  4. Graph-base algorithms, such as HNSW, SSG, NSG;

Thanks

I do not know, so will leave that for someone with better knowledge of the internals.

@hegebharat ANN is still work in progress, and it is not yet available in elasticsearch. One way to improve the performance on your query :

{
  "script_score": {
    "query": {
      "match_all": {}
    },
    "script": {
      "source": "cosineSimilarity(params.query_vector, 'title_vector') + 1.0",
      "params": {"query_vector": query_vector}
    }
  }
}

is instead of using "match_all": {} query, use more specific filter that target much smaller number of documents. cosine_similarity is an expensive operation, filter allows you to limit the number of docs that it targets.

We also recommend to use "_source": false on a search request.

1 Like