Yea it's bcos it matched light, working etc. but we can't display that document to the user as it's not relevant to user query. the document is all about ms office issues.
How to handle such scenarios ? we must display the document only if it has some relevant info to the user query.
If know what kinds of books the user cares about you could stick a keyword on the book and then add a bool query where both your match query and a new match query for the keyword field are in the must part of the query.
If you don't know up front how to tag your documents or what tags the user cares about then you are going to have to get more creating and Elasticsearch doesn't have things out of the box for you.
But for instance we tagged a document to windows: a file which has lot of sub topics like how to reset password, how to connect to internet etc.
If the user searches for "how to connect to internet" the query will not return any results because must query will return 0 as it doesn't match with windows.
Document tagging is one of the possible alternatives. but for this case I doubt it may not be the right fit. I will work on it.
And yea as you said we also don't know what user cares about.
As of now we are removing stop words, using stemmers, and tokenizers kind of NLP while indexing
any other alternatives? like using rescore api, tweaking bm25 parameters, putting a score limit on the document? or writing some advanced queries?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.