Levenshtein distance


(Adrian C) #1

Hi,

I am new to ES and have been doing some simple testing of fuzzy matching. I
have a query related to Levenshtein distance. Does ElasticSearch
use Levenshtein distance or Damerau–Levenshtein distance?

For example I have the following text stored in an index (analyzer: simple):
AARONS

When search using 'arosn' the text is not found. The queries that I have
been testing with are as follows:

{
"size":50,
"query":{
"fuzzy":{
"surname":{
"value":"arosn",
"fuzziness":2,
"prefix_length":1,
"max_expansions":100
}
}
}
}

and

{
"size":50,
"query":{
"match":{
"surname":{
"query":"arosn",
"fuzziness":2
}
}
}
}

{
"size":50,
"query":{
"match":{
"surname":{
"query":"arosn~",
"fuzziness":2
}
}
}
}

{
"size":50,
"query":{
"query_string":{
"default_field":"surname",
"fuzziness":2,
"query":"arosn~2"
}
}
}

If the Damerau–Levenshtein distance algorithm was is use then I would
expect this to match with a distance of two:

arosn + (a) à aarosn + swap (n & s) à aarons

I am a little confused as there is reference to Damerau–Levenshtein:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_fuzziness

So any ideas on how I can get Damerau–Levenshtein to work?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c9b9fb8b-d1f4-46d8-9426-a1dc1a729c9a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Adrian C) #2

Resolved this by setting transpositions true on the request - didn't see
this option documented but found it by looking through the source.

{
"size":50,
"query":{
"fuzzy":{
"surname":{
"value":"arosn",
"transpositions":true,
"fuzziness":2,
"prefix_length":1,
"max_expansions":100
}
}
}
}

On Wednesday, 28 May 2014 13:10:22 UTC+1, Adrian C wrote:

Hi,

I am new to ES and have been doing some simple testing of fuzzy matching.
I have a query related to Levenshtein distance. Does ElasticSearch
use Levenshtein distance or Damerau–Levenshtein distance?

For example I have the following text stored in an index (analyzer:
simple):
AARONS

When search using 'arosn' the text is not found. The queries that I have
been testing with are as follows:

{
"size":50,
"query":{
"fuzzy":{
"surname":{
"value":"arosn",
"fuzziness":2,
"prefix_length":1,
"max_expansions":100
}
}
}
}

and

{
"size":50,
"query":{
"match":{
"surname":{
"query":"arosn",
"fuzziness":2
}
}
}
}

{
"size":50,
"query":{
"match":{
"surname":{
"query":"arosn~",
"fuzziness":2
}
}
}
}

{
"size":50,
"query":{
"query_string":{
"default_field":"surname",
"fuzziness":2,
"query":"arosn~2"
}
}
}

If the Damerau–Levenshtein distance algorithm was is use then I would
expect this to match with a distance of two:

arosn + (a) à aarosn + swap (n & s) à aarons

I am a little confused as there is reference to Damerau–Levenshtein:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_fuzziness

So any ideas on how I can get Damerau–Levenshtein to work?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e4cf61f0-1c6c-42d4-a01a-e51598bdf196%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3