Hi all,
I have been struggling for a while to keep an Elasticsearch cluster performant for vectors search, while indexing new data. I am researching different strategies on how to keep the index up-to-date, while being able to perform expensive vector search queries on it.
One solution that we are exploring is using node attributes as suggested here: Manage heavy indexing in kNN indexes. For now this seems promising. However, there is one problem it does not solve yet.
I have been noticing that the kNN index is pushed out of memory (I am preloading it into memory as suggested in Tune approximate kNN search | Elasticsearch Guide [8.12] | Elastic) while performing heavy index. Once the heavy indexing is done, I can restart the nodes, and it will be loaded into memory again. All is good then. However, it is not ideal having to restart the nodes for obvious reasons.
I was wondering whether it would make sense to mount the place where the vector files (vem, vex, vec) are being stored as a tmpfs. Would this mean that the vectors are being kept in memory? And what about restarts?
If this is not a suitable solution, is there any other way to make sure that the vector files are (i) either kept in memory, or (ii) to enforce that they will be loaded into memory again without restarting the nodes?
Thanks in advance!