Using painless scripting to re-implement BM25 scoring

Hello everyone

I have indexed a few hundreds of millions of data into ElasticSearch using the default parameters of b and k1. It is prohibitive for me to reindex the data however i would like to optimize the parameters b and k1 of bm25 for better scoring.
To my understanding there are some functions in Painless scripting that could compute/fetch tf and idf scores of a token in a document.
Could you please reproduce BM25 in Painless scripting so that i could tune the b and k1 parameters?

Thank you all in advance
Dimitris

In Elasticsearch the module implementing textual scoring is called similarity. There isn't a need to write a painless script, since b and k1 can be customized. However if you are intent on using painless, you can write a similarity script.

Unfortunately this will not work.
As i mentioned i do not want to reindex.
I have hundreds of millions of data.
i just need to optimize the values of b and k1.
I just want to replicate bm25 with painless scripting so to just change thw values of b and k1 as needed.

Thank you again

The similarity can be changed on an existing index, no reindexing necessary. What do you see implying reindexing would be needed?

I thought that "PUT /index ..." is a creation of an index.
Dont i need to re-index the data once i change the mapping/settings ?

While some settings cannot be changed (eg number_of_shards) many settings can be, like the configured similarity. Use the update settings api.

Dont i need to re-index the data once i change the mapping/settings ?

Since the similarity settings are not baked into the index, changing these parameters does not requiring reindexing.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.