Using painless scripting to re-implement BM25 scoring

dpappas · August 19, 2019, 7:06pm

Hello everyone

I have indexed a few hundreds of millions of data into ElasticSearch using the default parameters of b and k1. It is prohibitive for me to reindex the data however i would like to optimize the parameters b and k1 of bm25 for better scoring.
To my understanding there are some functions in Painless scripting that could compute/fetch tf and idf scores of a token in a document.
Could you please reproduce BM25 in Painless scripting so that i could tune the b and k1 parameters?

Thank you all in advance
Dimitris

rjernst · August 19, 2019, 10:10pm

In Elasticsearch the module implementing textual scoring is called similarity. There isn't a need to write a painless script, since b and k1 can be customized. However if you are intent on using painless, you can write a similarity script.

dpappas · August 19, 2019, 10:28pm

Unfortunately this will not work.
As i mentioned i do not want to reindex.
I have hundreds of millions of data.
i just need to optimize the values of b and k1.
I just want to replicate bm25 with painless scripting so to just change thw values of b and k1 as needed.

Thank you again

rjernst · August 19, 2019, 11:27pm

The similarity can be changed on an existing index, no reindexing necessary. What do you see implying reindexing would be needed?

dpappas · August 20, 2019, 9:44am

I thought that "PUT /index ..." is a creation of an index.
Dont i need to re-index the data once i change the mapping/settings ?

rjernst · August 21, 2019, 12:16am

While some settings cannot be changed (eg number_of_shards) many settings can be, like the configured similarity. Use the update settings api.

rjernst · August 21, 2019, 12:17am

Dont i need to re-index the data once i change the mapping/settings ?

Since the similarity settings are not baked into the index, changing these parameters does not requiring reindexing.

system · September 18, 2019, 12:17am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bm25 tune parameters using elasticsearch Elasticsearch painless	1	249	October 25, 2022
Trying to understand Painless Elasticsearch	3	435	November 7, 2017
Need assist with Painless scripting Elasticsearch painless	4	290	December 4, 2023
Accessing query from within painless script? Elasticsearch	3	574	May 11, 2017
How to add parameters to a scripted similarity? Elasticsearch painless	2	349	November 28, 2022

Using painless scripting to re-implement BM25 scoring

Related topics