IO wait when using dense_vector field

Elasticsearch 7.15.2

While testing the performances of various types of queries using the dense_vector field type we noticed that a significant amount of time is spent loading the dense_vector from the disk.

We observed the following behaviors :

  • Using the dense_vectors in a rescorer we noticed that by incrementing the window_size parameter we have higher spikes in disk reads (and therefore io wait) each time a new query is executed.
  • When executing twice the same query (and therefore loading the same vectors in the rescorer) we have a spike in disk reads with the first query and pretty much no disk usage with the second one. We can also see that the time dedicated to the rescorer is significantly lower on the second query (we evaluated the time taken by the rescorer using the profile api).
  • When using smaller indexes that can fit in memory this issue is visible only when executing the first few queries. Once most of the vectors have been "seen" by the rescorer the disk usage becomes negligible and queries are faster.

Since in # Performance and storage of the dense_vector type it is explained that Elasticsearch compresses the dense vectors when storing them we tried reducing the number of decimals used to represent the vectors, this resulted in less disk space used but in practice no difference in the disk read spikes that are slowing down the queries.

Is there a way besides reducing the window_size parameter to avoid the high disk wait when using the dense_vector field?

Has anyone ever dealt with such issue or has a suggestion on how to deal with it?

Vector values are stored by default in memory mapped files. So it may take some time to load them on first access, but access to vector values for subsequent queries will be served from memory, as long as there is enough space in the file system cache. We recommend leave enough space for the file system cache.

If you really worry about performance hit on the 1st access, here is information how to make ES to preload values. In 7.x vector values are stored in .dvd files as other doc values fields.