We have been using an older Elastic version (1.4) for a while and have recently upgraded but wish to continue using the TF/IDF scoring algorithm.
In Similarity module | Elasticsearch Reference [7.11] | Elastic, there is an example of how to re-implement TF/IDF.
It is:
"source": "double tf = Math.sqrt(doc.freq); double idf = Math.log((field.docCount+1.0)/(term.docFreq+1.0)) + 1.0; double norm = 1/Math.sqrt(doc.length); return query.boost * tf * idf * norm;"
Why is field.docCount being used instead of simply the number of indexed documents?