Clarification issue in ColPali late interaction article: Hamming distance vs XOR value

I would like to report a conceptual inconsistency in the following article:

Issue summary

In the section describing binary quantization and similarity scoring, the article states:

“Simple binary quantization will transform D into 10101101 and Q into 11111011. For hamming distance, we need direct bit math—it's extremely fast. In this case, the hamming distance is 01010110, which is 86. So, scoring then becomes the inverse of that hamming distance.”

This description conflates Hamming distance with the XOR bitmask interpreted as an integer, which are not the same thing.


Technical clarification

Given:

  • D = 10101101

  • Q = 11111011

The XOR result is:

D XOR Q = 01010110

However:

  • The true Hamming distance is the number of differing bits, which here is 4

  • The value 86 is the integer interpretation of the XOR bitmask, not the Hamming distance

Using 1 / 86 ≈ 0.012 therefore inverts the XOR mask value, not the Hamming distance.

1 Like

Yep! this is a bug in the blog for sure.

The principle is the same, but indeed, the math example is broken. We should instead do a popCount and inverse.

Thanks for finding this!

3 Likes

Welcome!

The blog post has been updated. Thanks for your feedback!