Score based on Term Frequency alone

hey-arnold · April 23, 2017, 12:51pm

i want to disable IDF, and maybe TF so that I just have a score based on how many terms are present from the given query. I've looked into solutions writing custom scripts, and playing around with the query, but these all involve splitting up the query itself into individual terms. The problem with this approach is that if you have some wrapper service which takes a query with multiple tokens in a string, you need to find a way to split the query into tokens before feeding them into the ES search request. The only way to do this reliably is to first make a request to the analyze endpoint, but this just slows things down.
(Examples: How to complete disable TF-IDF?, https://www.elastic.co/guide/en/elasticsearch/guide/current/ignoring-tfidf.html)

I think ES is awesome, but I think it would be cool if there were more similarity modules which cover simple use cases, like when you want a simple count on term presence in a document. Why not just have a bunch load of similarity modules: summing one hot vectors, cosine, etc...

Any advice on how I can achieve my goal? I am using elastic-search 5.2.2.

Thanks!

EDIT: I got this working by writing a plugin. There are examples online but they are outdated, I will formalise my solution, post to GIT, and update this answer in due time.

hey-arnold · April 25, 2017, 11:04pm

Ok, If anyone wants to disable IDF, disable TF, as to just score based on the presence of a term and boost value on the field, in elastic search v5+, then see the following plugin:

If you don't want to disable TF but don't know how to make a plugin, the code in the repo above should help, adding TF should be simple.

Also note some guy has implemented this into the latest elasticsearch code, see:

You will just need to set "similarity": "boolean" on properties. This is available in elasticsearch 5.4.0 +

system · May 23, 2017, 11:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to disable TF/IDF completely Elasticsearch	7	4719	April 10, 2018
Which is the best way of disabling IDF? Elasticsearch	5	2923	July 5, 2017
How to complete disable TF-IDF? Elasticsearch	4	4799	February 6, 2017
Scoring based on existence of all terms even if one term appears multiple times Elasticsearch	2	408	July 5, 2017
TF/IDF wihout TF Elasticsearch	4	894	July 6, 2017

Score based on Term Frequency alone

Related topics