Custom TF-IDF implementation

I'm trying to implement a custom TF-IDF-like algorithm with scripted similarity, my current approach:

The way term frequency is determined is custom, these values are precalculated and stored in the records as lists. as an example:

{
"my_text": "the apple falls",
"my_text_counts": [10, 5, 7]
}

in this document the array of numbers represents the term frequency for each word in the textfield.

I now want the similarity score to be the sum of the inverses of these values, if the corresponding word is in the query.

e.g. the query "the apple" would yield
(1 * 1/10 + 1 * 1/5 + 0 * 1 / 7)
and "the falls"
(1 * 1/10 + 0 * 1/5 + 1 * 1 / 7)

After a lot of searching through the documentation I'm starting to think this is impossible with the current scripted similarity context.

Any tips or advice would be welcome

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.