You can provide your own similarity to be used at the field level, but
recent version of elasticsearch allows you to access the tf-idf values in
order to do custom scoring [1]. Also look at Britta's recent talk on the
subject [2].
That said, either your custom similarity or custom scoring would need
access to what exactly are the terms which are repeated many times. Have
you looked into omitting term frequencies? It would completely bypass using
term frequencies, which might be an overkill in your case. Look into the
index options [3].
Finally, perhaps the common terms query can help [4].
[1]
[2] https://speakerdeck.com/elasticsearch/scoring-for-human-beings
[3]
[4]
Cheers,
Ivan
On Thu, Mar 20, 2014 at 8:08 AM, geantbrun agin.patrick@gmail.com wrote:
Hi,
If I understand well, the formula used for the term frequency part in the
default similarity module is the square root of the actual frequency. Is it
possible to modify that formula to include something like a
min(my_max_value,sqrt(frequency))? I would like to avoid huge tf's for
documents that have the same term repeated many times. It seems that BM25
similarity has a parameter to control saturation but I would prefer to
stick with the simple tf/idf similarity module.
Thank you for your help
Patrick--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9a12b611-d08d-41f9-8fd4-b74ad75a6a5c%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/9a12b611-d08d-41f9-8fd4-b74ad75a6a5c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBKkA9-gBOYZau%3DDWn-O0f_XVqNmXJa67zSCnC1uLmV4A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.