Fuzzy search question

Hello I am doing reverse image search using elasticsearch. I have hashes stored in index and now I am trying to find similar hashes(to compensate compression and what not) using Query String fuzziness.
my code for search is:

var searchResponse = await Program._elasticclient.SearchAsync<IndexedImage>
                (
                s => s.Index("images").Query(q => q.QueryString(qs => qs.FuzzyMaxExpansions(150).Fuzziness(Fuzziness.EditDistance(150)).Fields(f => f.Field(ff => ff.imagehash)).Query(imagehash))).Size(10000)
                ).ConfigureAwait(false);

imagehash field is string holding hash(~600 character long number)
When I am trying to find similar strings it works fine but it's not working well at some cases and I am wondering whats wrong.
For example, this is original hash and when searching it returns 2 results from DB:

222230302101000014343434014133341141303411413033013140410000000304142022222214134313414240103030010130300101101003031110130314133303434110102222222243431000030024241411432324214232141143133431411144421000030034042222222214111333121143232223323211113333333331211311333344442000444340002222222201013434222032323130110111103303323011011110330334330101343111112222222231333111010244440400444434304131141343134444400044441030131133432222222233311131101423130001444401044444434400404444003041410020000034242222222201003414434300000000343430310131303201013131010102003434121143032222224203034444444410002123421201033444343421413434314101044444444230402222

This is hash of similar image(levenshtein distance is 52) and when searched db returns 0 results:

222230302101000024343434014133341141303411413033013140410000000204242022222214134313414140103030010130300101101003031110130314133303424120102222222243431000030034241411432324214222141143133431411144421000030034042222222214111333121243232223323212113333333331211312333344432010444340002222222201013434323021312120120211103303323011011110230333331101343111112222222231333111010244440400444434304131141443134444400044441030131133432222222233311131101423130001444401044444434400404444003041410030000034242222222201003414434300000000343430310131303201013131010101003434222142022222224203024444444410002123421211133343343421413434314101044444444230402222

When I took original hash and replaced bunch of numbers with same amount of 9s I got levenshtein distance 59 and search returned 2 results as original. hash:

222230302101000014343434014133341141303411413033013140410000000304142022222214134313414240103030010130300101101003031110130314133303434110102222222243431000030024241411432324214232141143133431411144421000030034042222222214111333121143232223323211113333333331211311333344442000444340002222222201013434222032323130110199999999999999999999999999999999999999999999999999999999190244440400444434304131141343134444400044441030131133432222222233311131101423130001444401044444434400404444003041410020000034242222222201003414434300000000343430310131303201013131010102003434121143032222224203034444444410002123421201033444343421413434314101044444444230402221

all hashes I have stored and ones I am searching are always same in length.
Similarity can be compared properly here: https://countwordsfree.com/comparetexts
Can anyone point me to right direction and tell me what am I doing wrong here? Thanks in advance

Anyone?

Which version of Elasticsearch are you using? What is the mapping of the field?

Elasticsearch version 7.5.2
my index mapping:

{
   "images":{
      "mappings":{
         "properties":{
            "imagehash":{
               "type":"text",
               "fields":{
                  "keyword":{
                     "type":"keyword",
                     "ignore_above":256
                  }
               }
            },
            "data":{
               "type":"text",
               "fields":{
                  "keyword":{
                     "type":"keyword",
                     "ignore_above":256
                  }
               }
            },
            "source":{
               "type":"text",
               "fields":{
                  "keyword":{
                     "type":"keyword",
                     "ignore_above":256
                  }
               }
            }
         }
      }
   }
}

Please be patient in waiting for responses to your question and refrain from pinging multiple times asking for a response or opening multiple topics for the same question. This is a community forum, it may take time for someone to reply to your question. For more information please refer to the Community Code of Conduct specifically the section "Be patient". Also, please refrain from pinging folks directly, this is a forum and anyone that participates might be able to assist you.

If you are in need of a service with an SLA that covers response times for questions then you may want to consider talking to us about a subscription.

It's fine to answer on your own thread after 2 or 3 days (not including weekends) if you don't have an answer.

1 Like

understood sorry.

Up. Still have not solved this.

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.