Have been experimenting with applying both forms of compression to our dense vectors and doing performance comparisons, but while bbq_hnsw has been performing relatively well at 1-3s per query average, int8 has been extremely slow with roughly 8-10s+ per knn query.
I dug into the disk usage of both the vector fields (which both contain the 10M vectorized images using 512-dimensions) thinking maybe the fields are bigger than I think and I didn’t give enough memory to the pod hosting it to have all the fields in memory, and while for 10M 512 dimensioned assets the BBQ compression vector looks about right in size with a rescore of 3 built-in, the int8 is just off the chart. That seems like the combined total of uncompressed vectors and compressed vectors.
Is this normal behavior? I couldn’t find any way to further break down this number via the documentation, off the top of my head since it is only in one and not the other, I’m leaning towards not normal. Just looking to see if maybe the reason its so slow is that its for whatever reason trying to load 26.4gb into memory when only 10GB is allocated to the application with a 5GB heap size. I do realize the heap size may need to inch up a bit as theoretically both compressed vectors combined sizes using the estimations are likely in the 6-7 GB range. Regardless, my understanding was that the non-compressed vectors are supposed to be stored on disk and inflate the _source size but not be stored within the int8 compressed vector field.
For extra context, our kNN search is using a k of 30, a num_candidates of 200, and a rescore of 20. It’s definitely bigger than the k=8-10 I’ve seen around, so thats also a possible cause of the slowdown, but I’m trying to get rid of any core architecture possibilities before modifying the query.
1-3s per query average, int8 has been extremely slow with roughly 8-10s+ per knn query
That seems slow to me in general. We should dig into that some more. Can you share the mappings you have for those vector fields and maybe a little more information about your k8s setup. I’m curious what kind of Disk I/O and CPU you have here. Sounds like 10GB of RAM per pod?
"image_vector_bbq": { "total": "1.3gb",
this makes sense to me but let’s break it down so you have the intuition for the math. BBQ compresses the vector be 1 bit per dimension + 14 (3 floats and a short) bytes for corrective factors.
So we have:
10000000*(512/8 + 14) = 0.78GB
And we have the HNSW structure it self (mostly a bunch of pointers). But should roughly be:
(12*4)*10000000 = 0.48GB
Total then:
~1.26GB
Looks pretty close (close enough for horse shoes and hand grenades anyway)
"image_vector_int8": {
"total": "26.4gb"
Then for int8 let’s see what that looks like:
we have for int8 compression for the vectors themselves:
10000000*512 = 5.2GB
and then for the HNSW graph:
(12*4)*10000000 = 0.48GB
total then:
~5.68GB
So what you have definitely seems off to me too by about 20GB, which just happens to be about the size of the raw vectors in this case and might explain the slowness just because there’s a lot loaded in RAM maybe because it’s in source. When HNSW doesn’t have enough RAM the algorithm falls off a performance cliff. (Funny enough we are just about launch an algo that’s going to deal with that performance cliff called bbq_disk, but I digress). So my guess is something is off with your config on the int8 mappings but honestly not entirely sure what that might be right off.
Hopefully looking at the config will help. And/or just seeing the expected math might be sufficient for you to see something obvious. Something tells me if we solve the sizing the slowness will make sense too. Either way let me know and happy to iterate with you on it.
We run a few things per pod related to ES, but the main ES container is allocated 10GB of RAM per pod. We run it on a n2-standard-8 equivalent and give it a limit of 4000m CPU power (milliCPU is the most bizarre measurement to wrap my head around). Storage is a Google persistent storage SSD that we have limited at nearly 2.5x the current index size total. We also have the JVM arguments set at "-Xms5g -Xmx5g" for ES_JAVA_OPTS – which I still think may need to be upped because even if it were just attempting to load just the compressed vectors of ~7GB seems like too little.
Yeah I definitely felt like the int8 vector mapping should not be including the raw vectors, and my assumption is that whatever is included in the _disk_usage request analysis under the int8 mapping it will be attempted to loaded into memory – which will definitely cause slow searches. I was making sure I wasn’t misunderstanding what I was seeing when doing the analysis and that it wasn’t a case where while it reports 26.8GB, it is only attempting to load that ~5GB of compressed vectors while just denoting the other 21GB is the full vectors solely on disk.
As for insertion, we vectorize an image then both “image_vector_bbq” and “image_vector_int8” are assigned that uncompressed vector as a value before insertion into the index.
Even if this doesn’t provide perfect clarity into what exactly is causing the issue – confirming it is definitely a case where the int8 mapping in the index reports 26.8 GB so it is attempting to load 26.8 GB into memory, then failing, so it uses SSD storage and that throws performance down a drain? Again, thank you so much for the helpful explanation and walkthrough of what we should be expecting; even knowing that this is a problem is very helpful!
Hmm, the configs look ok. What version of ES are you running as well? I’m not remembering but I wonder if there was a bug in the computation for how much disk is being used. Have you tried excluding those fields both from source as well:
I’d be curious if that changes the output of the _disk_usage api. I’ll go do a quick test myself too and see if that stuff all outputs expected values on the latest ES version.
I do bet your memory constrained either way for int8. If you can you might try a slightly bigger machine with more RAM or for testing drop to like half of your data and see if that helps. I’d be curious if that greatly improves performance there.
k of 30, a num_candidates of 200, and a rescore of 20
I missed some of this on my initial read too. So k=30 seems fine to me. num_candidates=200 also seems fine but with a caveat that you may find that you can and should tune this differently for each algo. The same is true of rescore which I’m assuming in this case is the oversample param (if you drop your query config here we can iterate on that too). 20 for oversample seems really high to me. I would expect int8 to not need it at all and may be a source of slowness there. And I would expect bbq to definitely need it but probably not at 20 . Curious if y’all have experimented with a smaller value there yet at all. But I bet that’s a large source of slowness. It’s data dependent but usually I’m trying to hit like a consistent recall / ndcg from some golden set rather than keeping those params the same for comparison. The reason for this is int8typically isn’t actually lossy, a lot of models just have too large a vector space. But bbq purposefully compresses into lossy territory but is so fast in distance computations that we can do much larger num_candidates and oversample exploration often times for better results in query time and recall.
Also, is it safe to remove them from _source? I was under the impression that they should remain in the source for search-ability sake but maybe I have a misunderstanding of how it needs to be loaded. I also thought that once removed, you’d have to reindex to get them back if ever needed; might also be wrong on this though.
On our dev pods, we have like ~800,000 assets and int8 is quite snappy there and is about the same time to return as BBQ.
As for our query, yeah we are just testing the water with that one for now and found that BBQ for our data with that query has performed pretty well on that both the 800k pod and the 10m pod. It definitely seems overkill for int8, and dropping the k to 10, num candidates to 50-100, and rescore to 5 decreases average time to like 2-3s for an int8 query which is much more manageable. However, this heavy query runs just fine for int8 on that smaller dev server so maybe it is a RAM thing.
In the meantime I will bump the RAM and heap size for the instance a couple GB and see if that helps since it should – if its just some reporting issue for that 20GB extra – now be able to store the compressed vectors and the HNSW map in memory.
You are not wrong. However, it’s safe to remove them from source for the sake of things like rescoring. It’s what I’d recommend. In fact in subsequent versions here we are going to remove them from source by default. In 9.1.0 they’ll be missing only for the sake of re-indexng. Storing them in source actually double stores the “raw” representations interestingly. We have one that we store in Lucene on index that is used for rescoring but that’s not the source. In subsequent versions we’ll reconstitute the vectors for the sake of reindexing from that other raw copy we already have on disk. Relevant PR (in case you want to learn more): Enable `exclude_source_vectors` by default for new indices by jimczi · Pull Request #131907 · elastic/elasticsearch · GitHub. Should save a ton of space. Only reason to ever store in source is if you have some real need to get back absolutely exactly what the vector was when you loaded but at the cost of storing that exact representation.
Ok just to make sure before beginning acting on this, the heap, if manually set at all, should still be 50% of available memory in this case. I was going to bump ES memory to 14GB and allot the JVM a heap of "-Xms7g -Xmx7g", as that should theoretically fit the vectors.
The _source mapping is a valid point then, but I may save that for a future reindex as it can take over a week to reindex everything. Another will inevitably come so good shout for when we do that!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.