this might be an advanced Topic. We have some kind of internal rating system for our database, wich is used for sorting the results. Works fine so far.
However, right now even completely irrelevant results will show up as we don´t understand how to really use min_score..
So here are my questions:
(1) I know about the min_score search and tried it. But what are good values to break down the results in tiers ? Is there a max_score so i can fetch a second tier ?
(2) Is there a way to find out what min_score and max_score is so i can decide how to use my filters before showing results ?
I can only talk about (3) - I think you need to take a step back and think about precision and recall. You seem to include a lot of results, that are not intended, which means you need to increase precision. Meaning, you need to narrow your search results and ensure you create queries searching for more exact values.
If you share a bit more about your queries, that might help others to chime in.
Hmm.. Querys (multi_match) will include 2-4 words and usually search trough 3 (text) fields. Also there are 1-6 (bool) Filters users can apply to narrow down the results.
It´s a wiki-style knowledge base wich has grown kinda big over the years.. Nothing i can do there.
I usually try to display the first 100 results.
The internal scoring system we use is a little bit like stackoverflow.
you could use a function score query which allows you to set a sort of a minimum score based no something like popularity plus full text search, and then you know the minimum score base line that could be used by min_score... just a quick thought, havent fully though this through
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.