I'm working on a solution that will act as a dictionary validator by
performing the following:
-
input: phrase
-
processing: shingles phrase match with fuzziness
-
output: rewritten phrase
-
data: dictionary like, with entries that are short phrases up to 5 words
(e.g "know it all", "merry go round")
What's particular about this use case is that we don't care about TF / IDF
and have another mechanism in mind to select an entry (but that's not the
issue).
The issue is that all started well, with queries involving a phrase
suggester, direct generator and collation, but that's where we hit a snag
with issues of fuzzy matches (edit distance >0) ranking higher than exact
matches...
I've been discussing this in another thread
(https://groups.google.com/forum/#!searchin/elasticsearch/bose/elasticsearch/dLdT90j1x74/zqJQiSlgHv8J)
but I wanted to present my use case a bit more clearly and see if there are
any advices to achieve the purpose.
I tried to use FLT, as kindly recommended by Mark Harwood but didn't figure
out how to use it as phrase suggester.
The key here I think is to control the scoring of the suggester, by not
accounting for TF / IDF and instead just provide a ranking by a n-gram
formula involving edit distance for further custom processing to select the
right suggester entry. I looked at smoothing models, but everything seems
to be based, to a +/- extent, on TF / IDF.
Any advice would be appreciated!
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82ed7214-0659-4140-a5cc-27c5905f1d7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.