I have indexed a few hundreds of millions of data into ElasticSearch using the default parameters of b and k1. It is prohibitive for me to reindex the data however i would like to optimize the parameters b and k1 of bm25 for better scoring.
To my understanding there are some functions in Painless scripting that could compute/fetch tf and idf scores of a token in a document.
Could you please reproduce BM25 in Painless scripting so that i could tune the b and k1 parameters?
In Elasticsearch the module implementing textual scoring is called similarity. There isn't a need to write a painless script, since b and k1 can be customized. However if you are intent on using painless, you can write a similarity script.
Unfortunately this will not work.
As i mentioned i do not want to reindex.
I have hundreds of millions of data.
i just need to optimize the values of b and k1.
I just want to replicate bm25 with painless scripting so to just change thw values of b and k1 as needed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.