I'm sending search queries to my ES index and get multiple results back. A lot of times the results with lower scores are irrelevant and I want to remove these results and return only high-quality results (which mostly have a higher score).
My index contains 1000 documents of type text of 100-500 words. For example - 'AVENGERS: ENDGAME is set after Thanos' catastrophic use of the Infinity Stones randomly wiped out half of Earth's population in Avengers: Infinity War. Those left behind are desperate to do something -- anything -- to bring back their lost loved ones. But after an initial attempt -- with extra help from Captain Marvel -- creates more problems than solutions, the grieving, purposeless Avengers think all hope is lost.'
If the user searches for 'Captain Marvel aka Brie Larson kills Thanos in the movie', the above document should be returned as a result since it contains similar terms.
Currently, I am using min_score to set the threshold, but I know it's not best practice and the scores vary depending on the number of documents in the index (which will keep growing). So this approach doesn't seem scalable.
I also tried multiple ways of tuning the query to get high-quality results back, such as More Like This functionality -
"must":
[{"more_like_this" : {
"fields" : field_list,
"like" : query_data,
"min_term_freq" : 1,
"max_query_terms" : 50,
"min_doc_freq" : 1,
"minimum_should_match" : '50%'}}]}}
But I'm still getting results with low scores like 1.5, whereas a good quality result usually has a score of 20. Is there a good way to tune the query further or adjust the min_score to be dynamic to only return highly relevant documents? Any help would be appreciated!