Modify the behavior of the FuzzyQuery

Lakomkin_Egor · May 18, 2015, 7:12am

Hi All,

I would like to ask for help and in particular which direction to start to dig. What I want to achieve is to modify fuzzy query behavior this way. Say, I have set of candidate tokens for error correction and my goal is to give more "weight" to candidates which contains changes in vowels. An example:

Lets say we search for "baban"

The candidates with distance might be:

"koban"
"bobon" <- this should have higher score.

Probably I need to add some information to token payload not only about the number of mismatches, but also about number of vowels/consonants changed.

In more general form:
I do not want to rely only on the TF/IDF statistics in such query, but also on some linguistic information: like vowel/consonant substitution.

I am quite new to the Elastic and I wanted to ask help which token filter I need to modify(if there exist any token filter for fuzzy queries).

Thank you in advance for help.

nik9000 · May 18, 2015, 1:16pm

You'd need to write a new query for this probably. You can probably extend
or wrap the existing fuzzy query to do the job.

I've done things like this several times. The best way to get this done is
to:

Create an empty elasticsearch plugin.
Add a Parser for your new fuzzy query. If you are just wrapping the
fuzzy query the simplest thing is to probably actually delegate to the
fuzzy query parser in elasticsearch to build the fuzzy query. For now have
your parser just return the fuzzy query from the delegate.
Write a builder and some tests for your query.
Fix your parser to actually wrap the fuzzy query - this is really the
hard part.

Keep in mind that whatever you write will only work in some languages - I
suspect there are conflicts in the definition of vowel. Also fuzzy queries
can only match an edit distance of 2 in their current form for some fun
reasons.

Once you've got that far you'll probably know better than I do what to do
next.

Nik

Topic		Replies	Views
Fuzziness in span query losing 1 edit distance Elasticsearch	2	575	April 15, 2022
Tolerance spelling Elasticsearch	4	1667	June 1, 2017
Alternative to similarity (float fuzzyness) Elasticsearch	4	1075	July 5, 2017
Fuzzy query that is making me crazy Elasticsearch	1	341	April 1, 2020
Fuzzy query Elasticsearch	3	340	July 6, 2017

Modify the behavior of the FuzzyQuery

Related topics