Is there any way to limit Term Frequency but keep function of proximity matching ?
for example:
#1 Sue ate the alligator Sue leave. Then Sue work. #2 Sue look the alligator. #3 Sue alligator #4 Sue never goes anywhere without her alligator-skin purse
if search sue alligator
We expect the order: #3#2#1#4
also #1 and #2 have the same score
If this is possible, then what's the best way to do this?
This is a bit abstract for me - by proximity matching do you mean phrase
matching? I don't believe there is a general, out of order ok, proximity
query.
By limit term frequency, what is your goal? Is it about weighting? Maybe
implementing a similarity via a plugin is the way to go about it.
Thanks Nik for reply!
let me make it clearly this way:
for the search term “sue alligator” we prefer results where “sue” and “alligator” are close together. so #3 is thus better than #4. but we attribute no extra value to the word “sue” appearing multiple times. hence #1 is not better than #2.
so…we would like proximity of search words to result in higher scores, but we do not want term frequency to affect score at all. Is this possible or there is some way work around it?
also, could I use phrase matching within a has_child query?
what If I want to disable term frequency for default simplify (TF/IDF), what's the easies way to do it other than a plugin?
( I found it here, said:
Setting index_options to docs will disable term frequencies and term positions. A field with this mapping will not count how many times a term appears, and will not be usable for phrase or proximity queries https://www.elastic.co/guide/en/elasticsearch/guide/master/scoring-theory.html
this make it lose proximity queries which is not we want)
Looks like that's a good way then. You can use a phrase query for "sue
alligator" and it'll work for in order phrases. You can set the phrase slop
to pretty high and you'll get some proximity scoring. It doesn't handle out
of order results iirc but its something.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.