Hi :
I was just reading Lucene's blog on Changes in Lucene 7.0, Since its on the roadmap for ES releases to include that, I was wondering what is the impact of moving away from TF/IDF to BM25 for scoring. For applications that are using the current TF/IDF based scoring for documents, how will it change docs rankings . specifically, for regression tests that might use some sort of ordering of the docs in the results.
In general you should see ranking improvements. BM25 is just a better way
of doing TFIDF. I think some specialized use cases classic Lucene TFIDF
can be easier to reason about.
I wrote quite a bit about BM25 vs TF*IDF in Lucene-base search here
Thanks Doug. I think you answered my quetsion, (being BM25 is just a different way of doing TF*IDF) that said, would it actually cause documents to be ranked in a different order (I am guessing yes,) If so Should regresion tests that depend on ordering based on scores be changed ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.