i want to disable IDF, and maybe TF so that I just have a score based on how many terms are present from the given query. I've looked into solutions writing custom scripts, and playing around with the query, but these all involve splitting up the query itself into individual terms. The problem with this approach is that if you have some wrapper service which takes a query with multiple tokens in a string, you need to find a way to split the query into tokens before feeding them into the ES search request. The only way to do this reliably is to first make a request to the analyze endpoint, but this just slows things down.
(Examples: How to complete disable TF-IDF?, https://www.elastic.co/guide/en/elasticsearch/guide/current/ignoring-tfidf.html)
I think ES is awesome, but I think it would be cool if there were more similarity modules which cover simple use cases, like when you want a simple count on term presence in a document. Why not just have a bunch load of similarity modules: summing one hot vectors, cosine, etc...
Any advice on how I can achieve my goal? I am using elastic-search 5.2.2.
EDIT: I got this working by writing a plugin. There are examples online but they are outdated, I will formalise my solution, post to GIT, and update this answer in due time.