Using ES as a dictionary server - need advice

I'm working on a solution that will act as a dictionary validator by
performing the following:

  • input: phrase

  • processing: shingles phrase match with fuzziness

  • output: rewritten phrase

  • data: dictionary like, with entries that are short phrases up to 5 words
    (e.g "know it all", "merry go round")

What's particular about this use case is that we don't care about TF / IDF
and have another mechanism in mind to select an entry (but that's not the
issue).

The issue is that all started well, with queries involving a phrase
suggester, direct generator and collation, but that's where we hit a snag
with issues of fuzzy matches (edit distance >0) ranking higher than exact
matches...

I've been discussing this in another thread
(https://groups.google.com/forum/#!searchin/elasticsearch/bose/elasticsearch/dLdT90j1x74/zqJQiSlgHv8J)
but I wanted to present my use case a bit more clearly and see if there are
any advices to achieve the purpose.

I tried to use FLT, as kindly recommended by Mark Harwood but didn't figure
out how to use it as phrase suggester.

The key here I think is to control the scoring of the suggester, by not
accounting for TF / IDF and instead just provide a ranking by a n-gram
formula involving edit distance for further custom processing to select the
right suggester entry. I looked at smoothing models, but everything seems
to be based, to a +/- extent, on TF / IDF.

Any advice would be appreciated!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82ed7214-0659-4140-a5cc-27c5905f1d7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.