Is there any way to limit Term Frequency but keep function of proximity matching?


(JUN LI) #1

Is there any way to limit Term Frequency but keep function of proximity matching ?

for example:

#1 Sue ate the alligator Sue leave. Then Sue work.
#2 Sue look the alligator.
#3 Sue alligator
#4 Sue never goes anywhere without her alligator-skin purse

if search sue alligator

We expect the order: #3 #2 #1 #4

also #1 and #2 have the same score

If this is possible, then what's the best way to do this?

Thanks in advance

Jun


(Nik Everett) #2

This is a bit abstract for me - by proximity matching do you mean phrase
matching? I don't believe there is a general, out of order ok, proximity
query.

By limit term frequency, what is your goal? Is it about weighting? Maybe
implementing a similarity via a plugin is the way to go about it.


(JUN LI) #3

Thanks Nik for reply!
let me make it clearly this way:

  1. for the search term “sue alligator” we prefer results where “sue” and “alligator” are close together. so #3 is thus better than #4. but we attribute no extra value to the word “sue” appearing multiple times. hence #1 is not better than #2.

so…we would like proximity of search words to result in higher scores, but we do not want term frequency to affect score at all. Is this possible or there is some way work around it?

also, could I use phrase matching within a has_child query?

  1. what If I want to disable term frequency for default simplify (TF/IDF), what's the easies way to do it other than a plugin?
    ( I found it here, said:
    Setting index_options to docs will disable term frequencies and term positions. A field with this mapping will not count how many times a term appears, and will not be usable for phrase or proximity queries
    https://www.elastic.co/guide/en/elasticsearch/guide/master/scoring-theory.html
    this make it lose proximity queries which is not we want)

(Nik Everett) #4

Looks like that's a good way then. You can use a phrase query for "sue
alligator" and it'll work for in order phrases. You can set the phrase slop
to pretty high and you'll get some proximity scoring. It doesn't handle out
of order results iirc but its something.


(Doug Turnbull) #5

I ran into this problem when implementing title search. You might find this
blog post useful


(system) #6