Scripted Similarity - how to get average field length


I am trying to reconstruct the Okapi BM25 default similarity in Elasticsearch.

For my specific setting we do not need the inverse document frequency for score calculation. In my research I haven't found a flag that could turn off idf like that so I turned to scripting a custom similarity.

My scripted similarity currently looks like this:
{ "similarity": {
"custom_similarity": {
"type": "scripted",
"script": {
"source": "double tf_n = (doc.freq * (1.2 +1)) / (doc.freq + 1.2* (1 - 0.7 + 0.7 * (doc.length/ average field length))); return
tf_n * query.boost; "

How can I insert the average field length into that formula?

Thanks for any help.


You could compute the average field length by doing double avgLen = (double) field.sumTotalTermFreq / field.docCount;

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.