Best way to trim results by score?

jmr317 · January 25, 2019, 2:55pm

Some of my search results returns a total of over 10k documents, varying from a high score (in my most recent search, ~75) to a very low score (less than 5). Other queries return a high score of ~20 and a low score of ~1.

Does anyone have a good solution for trimming off the less relevant documents? A java or query implementation would work. I've thought about using min_score, but i'm wary of that since it has to be a constant number, and some of the scores of my responses are a lot closer than the above. I suppose I could come up with some formula based off of the returned scores to create a cutoff for every response, but I was curious if anyone has come up with a solution to a similar use case?

jpountz · February 1, 2019, 9:20am

Hi Jon,

In general it is recommended to not do anything like that and just return documents in descending order of score so that the most relevant ones appear first.

Instead of using a score cutoff, the general approach is usually to use a cutoff on the rank and a rescorer. For instance you could take the 10 best documents by relevance and reorder them based on some other criteria that denotes the authority or popularity of the document: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-rescore.html

Mark_Harwood · February 1, 2019, 9:33am

If you're building a faceted search interface using aggregations it's often useful to do this to avoid a long-tail making a nonsense of your facet summaries. Someone searching for a video by typing ice age shouldn't be told there's 300 matches in the electricals department just because you matched a lot of refrigerators with an ice dispenser.
One technique I've seen used in e-commerce sites is to start with a very tight interpretation of user input e.g. running the input ice age as a strict "ice age" phrase match. Only if the results are very few in number do they re-run a relaxed form of the search i.e. ice OR age. Obviously picking what that magic threshold number is can be tricky and offering users ways to rewrite the query can help.

system · March 1, 2019, 9:33am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filter search documents by score Elasticsearch	8	3580	December 28, 2016
Return only high quality results in ElasticSearch query Elasticsearch	2	5814	June 7, 2019
How to remove low correlation results? Elasticsearch	1	276	June 30, 2023
Remove poor results? Elasticsearch	2	341	June 17, 2018
Getting results with score 0 Elasticsearch	3	6369	July 5, 2017

Best way to trim results by score?

Related topics