Fuzzy Searching Inconsistencies

JD83 · January 9, 2018, 2:32pm

Hi all.

(version: Elastic 5.5)

I've been looking into "explanations" of fuzzy searches and it seems like there are some inconsistencies in what is matched under different circumstances.

For example, we have a large index (260gb), which if I try and do a query with fuzziness = 1 for a simple word like "river", I see that it matches to both "river" and words with a letter on the end of river, eg "riverk". But I don't see any examples of it matching to "rive", even though we are using edgeNGram analysis, so "rive" will match wherever "river" will match.

If I recreate the index's settings and mappings, and then just insert 2 documents (found in the 260gb index), and then run the same query, I do see that it matches to "rive", "river" and "riverk".

I found this blog by Michael McCandless that describes how fuzzy became a lot more efficient in V4:

Could it be that this is a symptom of efficiency? ie. not all types of fuzzy match are tested for each document (he says that it was an improvement over a brute force method), and so in large indexes Elastic tries to be very lean on fuzzy searches, whereas in smaller documents it can afford to try more types of match?

Any insight on this would be much appreciated.

system · February 6, 2018, 2:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.