IO wait when using dense_vector field

gioele · April 26, 2022, 10:10am

Elasticsearch 7.15.2

While testing the performances of various types of queries using the dense_vector field type we noticed that a significant amount of time is spent loading the dense_vector from the disk.

We observed the following behaviors :

Using the dense_vectors in a rescorer we noticed that by incrementing the window_size parameter we have higher spikes in disk reads (and therefore io wait) each time a new query is executed.
When executing twice the same query (and therefore loading the same vectors in the rescorer) we have a spike in disk reads with the first query and pretty much no disk usage with the second one. We can also see that the time dedicated to the rescorer is significantly lower on the second query (we evaluated the time taken by the rescorer using the profile api).
When using smaller indexes that can fit in memory this issue is visible only when executing the first few queries. Once most of the vectors have been "seen" by the rescorer the disk usage becomes negligible and queries are faster.

Since in # Performance and storage of the dense_vector type it is explained that Elasticsearch compresses the dense vectors when storing them we tried reducing the number of decimals used to represent the vectors, this resulted in less disk space used but in practice no difference in the disk read spikes that are slowing down the queries.

Is there a way besides reducing the window_size parameter to avoid the high disk wait when using the dense_vector field?

gioele · May 2, 2022, 12:15pm

Has anyone ever dealt with such issue or has a suggestion on how to deal with it?

mayya · May 4, 2022, 6:49pm

Vector values are stored by default in memory mapped files. So it may take some time to load them on first access, but access to vector values for subsequent queries will be served from memory, as long as there is enough space in the file system cache. We recommend leave enough space for the file system cache.

If you really worry about performance hit on the 1st access, here is information how to make ES to preload values. In 7.x vector values are stored in .dvd files as other doc values fields.

system · June 1, 2022, 6:49pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
More search time Elasticsearch	10	560	June 18, 2020
Dense vector upload slow down Elasticsearch	3	484	June 24, 2022
Elasticsearch dense_vector is taking up too much storage space！Help Elasticsearch vector-search	8	228	September 24, 2024
Performance and storage of the dense_vector type Elasticsearch	3	2623	April 22, 2021
Dense vector field space requirements Elasticsearch vector-search	3	1450	December 23, 2022

IO wait when using dense_vector field

Related topics