No results in fuzzy query_string query with 0.01 fuzzy_min_sim value

I'm trying to get fuzzy search working using the query_string query but I'm having a lot of issues. It seems like no matter what I set my fuzzy_min_sim value to, it makes very little difference in the results.

Here is a small example of my situation:

Why is it that the above won't return the correct result, even with a 0.01 fuzzy_min_sim value? As far as I understand, it isn't a term/token problem because I'm setting the mapping to be lowercase and as a single token. I've also tested this with just a simple string as the email field like abcdefghijklmnopqrstuvwxyz and had similar results.

How can I set this up so that even if I misspell half the entire field I still get the right result? I thought that's what the fuzzy_min_sim value was, at 0.5 it will return matches where the L distance is under 0.5*len(string), i.e. 15 incorrect characters in this case.

Thank-you!

Actually the float notion of the min_similarity is not fully supported
anymore. We only support string distance 1 or 2 (Levenshtein Distance) so
no matter what you put in the float it will be Math.min(2,
floatToLD(value)) of some sort. your example has way more than 2 edits.

simon

On Tuesday, August 13, 2013 3:03:21 AM UTC+2, Jeff Dyck wrote:

I'm trying to get fuzzy search working using the query_string query but
I'm
having a lot of issues. It seems like no matter what I set my
fuzzy_min_sim
value to, it makes very little difference in the results.

Here is a small example of my situation:
gist:911789353a91de6810f4 · GitHub

Why is it that the above won't return the correct result, even with a 0.01
fuzzy_min_sim value? As far as I understand, it isn't a term/token problem
because I'm setting the mapping to be lowercase and as a single token.
I've
also tested this with just a simple string as the email field like
abcdefghijklmnopqrstuvwxyz and had similar results.

How can I set this up so that even if I misspell half the entire field I
still get the right result? I thought that's what the fuzzy_min_sim value
was, at 0.5 it will return matches where the L distance is under
0.5*len(string), i.e. 15 incorrect characters in this case.

Thank-you!

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/No-results-in-fuzzy-query-string-query-with-0-01-fuzzy-min-sim-value-tp4039516.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hmm, interesting, thanks Simon. If this is the case, is there another recommended way of doing fuzzy matching with greater possible distance?

I find this a bit odd, though. I seem to remember doing fuzzy matching with this same type of query not too long ago and still had good results, and I've not updated my version of ES in a while.

This is the old fuzzy query that has super bad performance. The new fuzzy
query is blazing fast but has the limits to 2 edits. You can potentially
write a query plugin that does use the slow old fuzzy query but currently
we don't support is. if you really need support for it you can open an
issue and we can discuss it there.

simon

On Tuesday, August 13, 2013 5:31:20 PM UTC+2, Jeff Dyck wrote:

Hmm, interesting, thanks Simon. If this is the case, is there another
recommended way of doing fuzzy matching with greater possible distance?

I find this a bit odd, though. I seem to remember doing fuzzy matching
with
this same type of query not too long ago and still had good results, and
I've not updated my version of ES in a while.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/No-results-in-fuzzy-query-string-query-with-0-01-fuzzy-min-sim-value-tp4039516p4039571.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ahh, okay. I'll work with it as it is for now and see if it still fits our needs. In sake of performance, I'd hope that 1-2 characters would be enough anyways.

For future reference, what version of ES introduced the new fuzzy behavior?

Thanks for your help, Simon

Jeff

This came with Lucene 4.0 so it's 0.90.0 that cut over to the fast
FuzzyQuery FYI it's ~20k % faster (yes 20k%!)

simon

On Tuesday, August 13, 2013 6:11:24 PM UTC+2, Jeff Dyck wrote:

Ahh, okay. I'll work with it as it is for now and see if it still fits our
needs. In sake of performance, I'd hope that 1-2 characters would be
enough
anyways.

For future reference, what version of ES introduced the new fuzzy
behavior?

Thanks for your help, Simon

Jeff

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/No-results-in-fuzzy-query-string-query-with-0-01-fuzzy-min-sim-value-tp4039516p4039575.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.