ANN Search Timeouts

We try to use the ANN (Approximate Nearest Neighbor) feature of the 8.0 Version of Elasticsearch. (k-nearest neighbor (kNN) search | Elasticsearch Guide [8.1] | Elastic)

At the moment we have indexed 16 million documents
Index a: ~6Mio
Index b: ~10Mio

We created the index using the following mapping

	{
		"mappings": {
			"properties": {
				"vector": {
					"type": "dense_vector",
					"dims": 768,
					"index": true,
					"similarity": "l2_norm"					
				}
			}
		}
	}

And the following query to retrieve the ann results

POST a/_knn_search
{
	"knn": {
      		"field": "vector",
			"query_vector": [
                  0.5619577,
                  -1.7599238,
                  ...
               ],
       		"k": 100,
		"num_candidates": 1000
    	},
	"_source": ["id", "documentParts.title"]
}

The setup is in a cloud environment where we currently have

  • 8 VCPUs
  • 128GBs of RAM
  • 2TB of SSD storage

On the virtual machine I have set up a Kibana and ES using docker-compose (GitHub - deviantony/docker-elk: The Elastic stack (ELK) powered by Docker and Compose.)

including the following env

    environment:
      - "ES_JAVA_OPTS=-Xmx64g -Xms64g"

This setup worked quite fine and we were happy with the response times. (1-6 seconds for k=50 ANN)

So now we tried to do the same with cosine instead of l2_norm.

	{
		"mappings": {
			"properties": {
				"vector": {
					"type": "dense_vector",
					"dims": 768,
					"index": true,
					"similarity": "cosine"					
				}
			}
		}
	}

We reindex all the data on the same machine and now we have 32 Mio docs.
Index a: ~6Mio
Index b: ~10Mio
Index a_cos: ~6Mio
Index b_cos: ~10Mio

Now we constantly get timeouts for the requests that worked perfectly before.
Error 504 (Gateway Timeout)

Why is it not working anymore?
What changed?
How can I debug this?
Is there a potential solution?

Thanks a lot

Searches are very fast if all data structures that are needed for _knn_search are already built and available. So if you index all your data, you don't have any more index updates, and then force merge to a single segment, wait till force_merge to be done, and then run your searches, you will get the best search performance.

What is slow is indexing, as building of HNSW graphs required for _knn_search is an expensive operation . So if you have concurrent indexing and search operations, periodically (by default every second) Elasticsearch will trigger refresh operation that will create a new segment and build a new HNSW graph for this segment to make new indexed data available for search. Some search operations will wait for these refreshes to finish, and can time out. Also, the more segments are created, the slower are searches, as it is faster to search one big HNSW graph, that many small ones.
The best way is to separate searches from indexing. Also, in you are not very concerned to make indexed data immediately available for searches, you can increase refresh_interval.

Hallo Mayya,
thanks for your reply.

So I have set, as you proposed the following values in the index settings:

  "index.refresh_interval": "-1",

and executed the force merge request to the indices.

There is no improvement.

Do I need to change my setup to speed it up?
For example have more but smaller instances?

Thanks a lot for your help.

Do you do index updates at the same time as searches?
Are the timeouts you are getting for search or index requests?

No additional data is indexed. The KNN /ANN searches are performed after the 16 Mio documents were added.

The timeouts come from the backend / middleware. It's currently set to 2 Minutes.
I did a curl request directly on the machine where Elasticsearch runs.
It takes 6.25 Minutes to return the result.
The request was executed on the b_cos index and had the following params:

"k": 100,
"num_candidates": 1000

6.25 minutes is very slow and it should not be that slow.
We have done an experiment with 10 million docs (although much smaller dimensions 96 versus yours of 768), and knn-search-100-1000 (k: 100, candidates: 1000) takes 11 ms. And this was done on a very modest machine (8Gb of heap).

Can you try the following:

  • leave it to Elasticsearch to automatically sets the JVM heap size; or at least have it to 30Gb max; as your 64GB is too high, and doesn't leave much space for system cache. Elasticsearch doesn't need that much Java heap memory, a lot of data files are memory mapped.
  • disable source in your query: "_source": false and run a query again. Make sure to run queries multiple times to get an average run time.
1 Like

Hi mayya, thanks again for the reply.

So I set the heap size to 24Gb.

I then send my previous requests to the database.
It felt like there was no change.

But after that, I wrote a benchmark script.

  • 100 requests
  • Randomly generated vectors
  • 2 randomly selected indices a_cos, b_cos
  • k = 100 neighbours
  • num_candidates = 1000

Results:
with "_source": false.

AVG: 1.449s
MIN: 0.751s
MAX: 2.403s

with "_source": ["title"]

AVG: 1.609s
MIN: 0.803s
MAX: 2.559s

So it seems like the reduction was helpful.

I will keep an eye on it and keep u posted.

But thanks a lot :slight_smile:

1 Like

Adding one other idea, since you mentioned searches maybe became slower once you switched from l2_norm to cosine for the similarity. The cosine similarity is convenient for testing and development, but can be slower to compute than the other types. For best performance, we recommend normalizing all the vectors in advance to have length 1, and using dot_product instead. These docs have more information under the similarity section: Dense vector field type | Elasticsearch Guide [8.1] | Elastic.

2 Likes

Some feedback after I added a new index c_cos with additional ~3Mio documents including vectors.

Same mapping
Same settings

DB in total:
Index a_cos: ~6Mio
Index b_cos: ~10Mio
Index c_cos: ~3Mio

Same test set up.
Strange behavior.

k=100 
k_num_candidates=1000

a_cos:  1s
b_cos:  30s
c_cos: 2 minutes

I aborted the test.

But after setting source:false and re-running the test:
for each index 100 requests.

a_cos:
AVG: 1.41s
MIN: 0.86s
MAX: 2.07s

b_cos:
AVG: 2.32s
MIN: 1.63s
MAX: 3.40s

c_cos:
AVG: 1.14s
MIN: 0.69s
MAX: 1.72s

So it somehow seems to be the case, that one has to run a few requests with source:false, before one can run them specific source fields.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.