Today we are using ES mainly as a key / value store where most of our reads are just get by key.
We have recently started to use KNN, where we have:
Around 10MM docs.
384 dim vectors.
Using cosine sim as the metric.
Documents are large (webpage), but we retrieve just the URL & vector.
The performance is really bad, but only on the first query. So making one query the time for results could be around 20 seconds, the second query (if done within a few seconds) will take 10 secs, and if we keep making KNN queries rapidly, the time drops to sub second.
If we stop making such queries for a few minutes the next query is once again 20 seconds or so.
I would love to understand this behaviour to see if there is something which can be done to "warm up" this query type.
Being slow and then fast indicates to me that the vector index was out of memory and then added to memory
Becoming slow again shows that it is being kicked out of memory. Usually this indicates that you don't have enough ram to have the vector index and other structures you are using in the index.
So, is this index being used for other things? Are documents continually being added?
One finding I have is that sharding the index seems to be detrimental to speed, which seems rather odd as I though the sharding would allow parallel searches...
If the different shards still have 40+ segments, it wouldn't help much. If your different shards had much fewer segments, then I would expect some improvement.
My machine size ram is 8gb, and the knn vectors are 11.8gb - does this mean we are at optimal performance of the current size of index vs cluster size?
Also our RAM does not even fit the index size in - is that an issue too?
i had a look the link for preloading data into file system cache, I can do this - but there is a warning there around the size, do you think it is still wise given the above? And finally on this point - I thought this would only make a difference for the first few requests, after that ES would load them into cache automatically - is that not correct?
For kNN to work optimally, the entire graph and vectors need to be in memory.
So, that means you need at least around 12gb of ram (not including the ram used by the JVM).
In 8.5 we added support for 'byte' encoded vectors. So you can quantize your vectors to int8 to make the much smaller and maybe run just fine within your current hardware constraints
Thanks @BenTrent that makes sense, and thank you for letting me know about the quantized vectors!
One more question, would it make sense for me to create another index which would be a sample of the full index, and it would contain just the vectors and some minor metadata. If I make such an index which would be much smaller - would I be able to force ES to hold this in memory even if it is not frequently accessed?
@dendog1 this would only happen if those indices were on different nodes. A node only has so much off-heap memory and all shards on that node must share it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.