How to improve query using test data and expected result

jmr317 · November 30, 2018, 8:31pm

I have a test set of data that includes a search string and the document id of the document that would be of highest relevance for that search string. Is there a specific way I can go about improving my query (currently just a multi match across multiple fields) so that that the most relevant documents are returning higher in my results?

Right now I'm just randomly picking boosts and cutoff_frequency's and running my training set through queries to see which query I've randomly created gives me the best result. Is there a more optimal way I could be doing this?

softwaredoug · November 30, 2018, 9:00pm

This is a complex topic, some have even written books on it

What you have is close to what's known as a judgment list: a set of graded documents for each query. There's a lot of standard metrics for taking a judgment list and coming up with a number on how good the results are:

Like:

https://en.wikipedia.org/wiki/Discounted_cumulative_gain

https://en.wikipedia.org/wiki/Precision_and_recall

You can also use tools that are built to use judgment lists and evaluate the quality of a search relevance solution:

http://quepid.com

https://sease.io/2018/07/rated-ranking-evaluator.html

On the solution - If you have good metrics you could trust, you could do a grid search on a set of parameters on your current query strategy.

BUT you'll only do good with proportion to the quality of the underlying queries. Just like machine learning is only as good as the underlying features. And that's the hard stuff people spend years on both inside and outside the search engine with complex enrichment of docs and queries. What you need to do is try to craft good ranking-time signals that turn a relevance score into something closer to what users care about when it comes to relevance, see here:

https://opensourceconnections.com/blog/2015/05/15/relevance-data-modeling/

IF you have good enough signals AND you have a lot of high quality judgments, you MIGHT be in a position where you could turn the ranking optimization into a machine learning problem:

https://opensourceconnections.com/blog/2017/02/24/what-is-learning-to-rank/

So I'm not sure if that helps other than just opens a pandoras box of stuff to learn about...

system · December 28, 2018, 9:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to get the rank of a document in a query? Elasticsearch	2	1079	July 5, 2017
Return only high quality results in ElasticSearch query Elasticsearch	2	5742	June 7, 2019
Evaluating your search system's performance Elasticsearch	5	1036	November 4, 2022
Boosting the relevance score based on the unique keyword found Elasticsearch	4	1497	July 5, 2017
[Theory] Improving search result relevance? Elasticsearch	8	1338	July 6, 2017

How to improve query using test data and expected result

Related topics