Custom similarity without TF/IDF scoring

Alex_D · August 5, 2020, 8:24pm

Hi all !

It isn't necessary to use TF / IDF scoring relevance for my full-text search task.
I am trying to receive score in range 0-100 based only on doc.length variable.

My formula:

"similarity": {
  "custom_similarity": {
    "type": "scripted",
    "script": {
      "source": "double norm = 100/doc.length; return norm * query.boost;"
    }
  }
}

I receive expected results if count of tokens in query <= count of tokens in document.

Suppose my query: "big bang theory"
Results:

"big bang theory",             score: 100%
"big bang theory stub1",       score: 75%
"big bang theory stub1 stub2", score: 60%

But with the same query in cases:

"big",       score: 100%
"big bang",  score: 100%

scoring doesn't work properly for me.

Some summaries

---------------------------------------------------
| Tokens_count               | Score    | Score   |
|----------------------------| expected | current |
| Query | Document | Matches |          |         |
---------------------------------------------------
|   3   |    3     |    3    |  100 %   |  100 %  |
|   3   |    6     |    3    |   50 %   |   50 %  |
|   3   |    9     |    3    |   33 %   |   33 %  |
|   3   |    1     |    1    |   33 %   |  100 %  | <-- current algorithm does not provide expected result
|   3   |    2     |    2    |   66 %   |  100 %  | <-- the same point
|   3   |    2     |    1    |   33 %   |  100 %  | <-- the same point
---------------------------------------------------

Would be glad to any advise how to update my formula or maybe to find a workaround,
thanks

system · September 2, 2020, 8:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Custom TF-IDF implementation Elasticsearch	1	327	March 30, 2023
Tf-idf custom similarity and bm25 gives same scores and identical results along with a minor problem Elasticsearch	3	460	October 23, 2022
Custom relevance scoring by term frequency averages Elasticsearch	2	1213	July 6, 2017
Question regarding TF/IDF implementation Elasticsearch	2	753	April 19, 2021
Custom scoring based on number of matches Elasticsearch	1	525	August 6, 2021

Custom similarity without TF/IDF scoring

Related topics