How can you match long strings?


(Yeikel) #1

Hello ,

One of the main challenges that I am facing at the moment is how to match long string applying fuzziness to them .

For example , let's say that we have the following document :

PUT my_index/type/2
{
  "name":"longnameyesverylong"
}

if I apply a fuzzy search on that name , like the following :

"match": {
            "name": {
              "query": "longnameyesverylong",
              "fuzziness": 2
            }

I can find it but my goal would be to be able to open the net and allow more than two mistakes for this type of strings.

Let's say for example that I index something like :

PUT my_index/type/2
{
  "name":"l1ngnam2yesver3long"
}

The previous match query won't be able to find this document, as the fuzziness is greater than 2 and that is not supported in ES.

I tried to use ngrams , but the tokens did not meet the requirement either and the index would grow too much.

The only option that I have on top of my head is to split the string manually at index time creating my "own tokenizer" and create a document that looks like

 PUT my_index/type/2
    {
      "name":"longnamey esverylong"
    } 

And then , at search time , split the string again and apply a Boolean query with fuzziness on each token. This can probably do what I need , but I feel that there is probably a better solution for this problem.

Is there any other approach that you think it might be appropriate?

Thank you.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.