Does anyone know the best way to store fixedlength binary data and query it,
while scoring with hamming distance?
A hamming distance filter with a threshold would also be ok.
Thanks a lot, this is useful for all kinds of similarity searches based on
fingerprinting algorithms.
You can use fuzzy queries for Levenshtein distance, but note that they are
slow(er) in Lucene 3.3, will be much faster in Lucene 4.0 (when it comes
out).
Does anyone know the best way to store fixedlength binary data and query
it, while scoring with hamming distance?
A hamming distance filter with a threshold would also be ok.
Thanks a lot, this is useful for all kinds of similarity searches based on
fingerprinting algorithms.
On Friday, August 26, 2011 1:50:51 PM UTC+3, anahap wrote:
Hi there,
Does anyone know the best way to store fixedlength binary data and query
it, while scoring with hamming distance?
A hamming distance filter with a threshold would also be ok.
Thanks a lot, this is useful for all kinds of similarity searches based on
fingerprinting algorithms.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.