Scoring per term match


(Jpotisch) #1

For our application we search multiple variations of a field (phonetic, prefix, ngrams, etc.) in a should clause. We then use function_score functions to boost scoring based on business-specific fields so that more recent, more popular, etc. records rise to the top. We also allow fuzzy matches, a limited number of term misses (minimum_should_match = 75%, etc.)

Our app tries to determine the best match across types by using scores. So for example, let's say we search across publishers, authors, books, and magazines. A user might type in a name of any of the above, or search across fields, e.g. "william shakespeare" should find the author but "shakespeare hamlet" should find the book. They might also type "shakes haml" and we should bring back Hamlet, or "willem shakesper" and we should return the author.

Because of the other factors we use to determine ranking, I want a much simpler starting score than the full TF-IDF approach which prefers all query terms matching all field terms and weighs unique terms higher.

Is there a way to make scoring use a simple linear method such that each term match gets a set amount, regardless of TF-IDF, terms that don't match, etc? In other words I want a search for "william shakespeare hamlet" against the book { title: "Hamlet", author: "William Shakespeare" } to get 3 points, "shakespeare hamlet" to get 2 points, "hamlet" to get 1 point, etc. and for those same queries to return the same exact scores against the book { title: "Hamlet Is A Very Good Book", author: "William Shakespeare Was A Very Good Author" }

Any guidance, not just a full solution, would be greatly appreciated!

Thanks,

-joel


(system) #2