No results in fuzzy query_string query with 0.01 fuzzy_min_sim value

jdyck · August 13, 2013, 1:03am

I'm trying to get fuzzy search working using the query_string query but I'm having a lot of issues. It seems like no matter what I set my fuzzy_min_sim value to, it makes very little difference in the results.

Here is a small example of my situation:

gist.github.com

https://gist.github.com/SeventhHelix/911789353a91de6810f4

gistfile1.sh

curl -XDELETE 'localhost:9200/test'

echo "Creating new ElasticSearch index..."
curl -XPUT 'localhost:9200/test' -d '
{
   "mappings" : {
      "user" : {
         "properties" : {
            "emailAddress" : {
               "analyzer": "string_lowercase",

This file has been truncated. show original

Why is it that the above won't return the correct result, even with a 0.01 fuzzy_min_sim value? As far as I understand, it isn't a term/token problem because I'm setting the mapping to be lowercase and as a single token. I've also tested this with just a simple string as the email field like abcdefghijklmnopqrstuvwxyz and had similar results.

How can I set this up so that even if I misspell half the entire field I still get the right result? I thought that's what the fuzzy_min_sim value was, at 0.5 it will return matches where the L distance is under 0.5*len(string), i.e. 15 incorrect characters in this case.

Thank-you!

simonw_2 · August 13, 2013, 10:42am

Actually the float notion of the min_similarity is not fully supported
anymore. We only support string distance 1 or 2 (Levenshtein Distance) so
no matter what you put in the float it will be Math.min(2,
floatToLD(value)) of some sort. your example has way more than 2 edits.

simon

On Tuesday, August 13, 2013 3:03:21 AM UTC+2, Jeff Dyck wrote:

I'm trying to get fuzzy search working using the query_string query but
I'm
having a lot of issues. It seems like no matter what I set my
fuzzy_min_sim
value to, it makes very little difference in the results.

Here is a small example of my situation:
gist:911789353a91de6810f4 · GitHub

Why is it that the above won't return the correct result, even with a 0.01
fuzzy_min_sim value? As far as I understand, it isn't a term/token problem
because I'm setting the mapping to be lowercase and as a single token.
I've
also tested this with just a simple string as the email field like
abcdefghijklmnopqrstuvwxyz and had similar results.

How can I set this up so that even if I misspell half the entire field I
still get the right result? I thought that's what the fuzzy_min_sim value
was, at 0.5 it will return matches where the L distance is under
0.5*len(string), i.e. 15 incorrect characters in this case.

Thank-you!

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/No-results-in-fuzzy-query-string-query-with-0-01-fuzzy-min-sim-value-tp4039516.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jdyck · August 13, 2013, 3:31pm

Hmm, interesting, thanks Simon. If this is the case, is there another recommended way of doing fuzzy matching with greater possible distance?

I find this a bit odd, though. I seem to remember doing fuzzy matching with this same type of query not too long ago and still had good results, and I've not updated my version of ES in a while.

simonw_2 · August 13, 2013, 4:02pm

This is the old fuzzy query that has super bad performance. The new fuzzy
query is blazing fast but has the limits to 2 edits. You can potentially
write a query plugin that does use the slow old fuzzy query but currently
we don't support is. if you really need support for it you can open an
issue and we can discuss it there.

simon

On Tuesday, August 13, 2013 5:31:20 PM UTC+2, Jeff Dyck wrote:

Hmm, interesting, thanks Simon. If this is the case, is there another
recommended way of doing fuzzy matching with greater possible distance?

I find this a bit odd, though. I seem to remember doing fuzzy matching
with
this same type of query not too long ago and still had good results, and
I've not updated my version of ES in a while.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/No-results-in-fuzzy-query-string-query-with-0-01-fuzzy-min-sim-value-tp4039516p4039571.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jdyck · August 13, 2013, 4:11pm

Ahh, okay. I'll work with it as it is for now and see if it still fits our needs. In sake of performance, I'd hope that 1-2 characters would be enough anyways.

For future reference, what version of ES introduced the new fuzzy behavior?

Thanks for your help, Simon

Jeff

simonw_2 · August 14, 2013, 9:35am

This came with Lucene 4.0 so it's 0.90.0 that cut over to the fast
FuzzyQuery FYI it's ~20k % faster (yes 20k%!)

simon

On Tuesday, August 13, 2013 6:11:24 PM UTC+2, Jeff Dyck wrote:

Ahh, okay. I'll work with it as it is for now and see if it still fits our
needs. In sake of performance, I'd hope that 1-2 characters would be
enough
anyways.

For future reference, what version of ES introduced the new fuzzy
behavior?

Thanks for your help, Simon

Jeff

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/No-results-in-fuzzy-query-string-query-with-0-01-fuzzy-min-sim-value-tp4039516p4039575.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Fuzzy 'query_string' search Elasticsearch	2	310	July 6, 2017
Fuzzy query returns nothing when value length is greather than one Elasticsearch	2	504	July 6, 2017
Max_expansions for query_string queries? Elasticsearch	2	375	July 6, 2017
Confusing results from fuzzy query (1 term, 1 field) Elasticsearch	2	417	July 6, 2017
Boolean query with minimum_should_match and fuzzy search Elasticsearch	1	1261	April 19, 2019

No results in fuzzy query_string query with 0.01 fuzzy_min_sim value

Related topics