I'm working on a solution that will act as a dictionary validator by
performing the following:
processing: shingles phrase match with fuzziness
output: rewritten phrase
data: dictionary like, with entries that are short phrases up to 5 words
(e.g "know it all", "merry go round")
What's particular about this use case is that we don't care about TF / IDF
and have another mechanism in mind to select an entry (but that's not the
The issue is that all started well, with queries involving a phrase
suggester, direct generator and collation, but that's where we hit a snag
with issues of fuzzy matches (edit distance >0) ranking higher than exact
I've been discussing this in another thread
but I wanted to present my use case a bit more clearly and see if there are
any advices to achieve the purpose.
I tried to use FLT, as kindly recommended by Mark Harwood but didn't figure
out how to use it as phrase suggester.
The key here I think is to control the scoring of the suggester, by not
accounting for TF / IDF and instead just provide a ranking by a n-gram
formula involving edit distance for further custom processing to select the
right suggester entry. I looked at smoothing models, but everything seems
to be based, to a +/- extent, on TF / IDF.
Any advice would be appreciated!
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82ed7214-0659-4140-a5cc-27c5905f1d7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.