Elasticsearch: changing bm25 params for more relevant documents

Hi team

I've indexed few pdf files. And as you know elasticsearch uses bm25 by default.

Lets suppose I have indexed 2 pdf files one regarding windows issues and the other file is linux issues. Now when I search for "Windows Installation Issue" the linux issues pdf is being returned by elasticsearch.
I think it's because the keyword windows is repeated very less in linux issues PDF and more often in Windows issues pdf. So, elasticsearch boosted the linux issues book as it has the keyword less number of times.

But, for my use case that should not be the behavior. As windows is repeated more often in windows issues, ES should return Windows issues book?

for tis to happen, do i need to change the bm25 param values to ignore the length of document or frequency ? or should i go with other algorithm? I think bm25 is good fit for most of the usecases.

Please suggest. Any suggestions will be helpful!

Thanks for your time as always

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.