Is fuzzy query in elasticsearch related to fuzzy logic?


#1

As the title states, what exactly in Elasticsearch's fuzzy-query is related to fuzzy logic?

For example, given a string, a fuzzy query with fuzziness of 2 will return all indexed strings that have a Levenshtein distance of 2. How does the system decide what answers to return if there are multiple matches?

Is there a fuzzy system behind it? one that has triangular functions (for instance) and can be expressed in something like this:

1|   A    B
 |   /\  /\      A = fuzzy set 1
 |  /  \/  \     B = fuzzy set 2
 | /   /\   \
0|/   /  \   \
 ------------
  a   b  c   d

I would like a more theoretical answer that tackles what exactly in fuzzy queries is so fuzzy?


(Mark Harwood) #2

Edit distance was one factor but TF-IDF was also part of the mix - IDF being handled badly until recently. See https://issues.apache.org/jira/browse/LUCENE-329 for the recent IDF fixes.


#3

Thanks very much for the quick answer, but I was looking to more of a begginers answer that also shows the mathematical model behind fuzzy queries in regard to fuzzy logic.


(system) #4