It's been nearly 10 years but I took a quick look at the code and the IDF
balancing stuff is still in there.
Testing queries against a large index of cars Lucene's standard fuzzy query
on ford~ still has top matches that aren't Fords. FLT works fine.
On Tuesday, January 20, 2015 at 9:11:56 AM UTC, Itamar Syn-Hershko wrote:
Famous last words
--
Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC memberOn Tue, Jan 20, 2015 at 11:11 AM, Mark Harwood <
mark.h...@elasticsearch.com <javascript:>> wrote:it doesn't seem like this would address the IDF
Trust me, I wrote it.
On Tuesday, January 20, 2015 at 12:16:44 AM UTC, kasper...@yahoo.com
wrote:Thanks Mark. Sounds like this issue affects a lot of people.
I looked at your suggestion about FLT, and the ignore_tf parameter
should help, however unless I'm missing something, it doesn't seem like
this would address the IDF, and results could be biased. But I will
experiment.Ultimately I think what my particular use case requires is a scorer that
only uses edit distance (when querying with fuzziness) and field boosts,
but no TF / IDF.On Monday, January 19, 2015 at 3:15:47 PM UTC-8, Mark Harwood wrote:
This issue rounds up a bunch of related issues that have been raised
previously: Wrap stacked tokens in `match` query in a BlendedTerms query for better scoring ยท Issue #9103 ยท elastic/elasticsearch ยท GitHubFor now try FuzzyLikeThis (http://www.elasticsearch.org/
guide/en/elasticsearch/reference/current/query-dsl-
flt-query.html#query-dsl-flt-query )
It blends More Like This and fuzzy functionality but includes the
adjustments to IDF that I think make more sense than the other
implementations with their bias towards rewarding scarcity.On Monday, January 19, 2015 at 6:48:49 PM UTC, kasper...@yahoo.com
wrote:I have the same problem, where some results with higher edit distance
are ranked higher than other results that are closer in terms of edit
distance.I suspect it does have to do with document frequency, as you think
Adrien. In my case I want to ignore document frequency completely. Any
suggestion to achieve this?I'm a taker of any solution as this looks like a show stopper for us,
so even a workaround would help.I can try to create this other rewrite method you mentioned if you
could point me in the right direction.Thanks
On Thursday, January 15, 2015 at 7:44:57 AM UTC-8, Adrien Grand wrote:
This is because the score takes two factors into account: the
document frequency and the edit distance. Quite likely in your case, even
though Boss is closer than Bose, Bose has a much lower document frequency
which helped it eventually get a better score. I guess we should have
another rewrite method that would not take freqs into account (or somehow
merge them) to avoid that issue.On Thu, Jan 15, 2015 at 4:06 PM, Eylon Steiner eylon....@gmail.com
wrote:Any ideas?
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/52e09e54-
90b6-4014-8454-34e3db5756e5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/52e09e54-90b6-4014-8454-34e3db5756e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.--
Adrien Grand--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9523b3d5-ffea-4760-9782-69167b9807ed%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9523b3d5-ffea-4760-9782-69167b9807ed%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/84f178ba-d150-4b7d-9a54-b419bc962499%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.