I want to do hamming distance on binary strings in elasticsearch, and I want to control the distance.
In my case, the binary strings have a length of 256 and I want everything with a hamming distance of 32 or below.
Is this possible using fuzzy searching? Looking at the docs, the
fuzziness parameter for fuzzy queries only accepts values;
AUTO I can set low and high distance values, so perhaps:
AUTO:0,32 would work?
Can someone confirm this? Is anyone else doing this? If so, how?
I've done some experiments, but aren't getting quite the results I expected.
I have several hashes, with varying hamming distances from my query hash, but results for the query:
only returns results where the hamming distance <= 2.
Is that expected?
Yes, that is expected.
AUTO adjusts the distance based on the length of the string, but the maximum is still
Thanks for the response.
Is there a reason why the limitation is 2? Is there any way to do what I want to do with elasticsearch?
I believe the reason for capping this is performance. I do not have any suggested way to do what you want so will leave that for others.
You may be able to do it using a script query or a runtime field, but I am not sure. this would have an impact on performance though, especially at scale.
I tried various scoring scripts, but they're not in the least bit performant.
OpenSearch has a hamming distance plugin. So it looks like I'll be migrating to that.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.